Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Is ChatGPT a Generalist Algorithmic Learner?

1 minute read

Published:

Sean McLeish, Avi Schwarzschild, Tom Goldstein

All benchmark code is available here: CLRS4LM GitHub.

This is an extension of our arXiv paper, available here: arXiv. Here we also present results on the CLRS size 16 training data and provide more discussion.

We are all at NeurIPS 2023, come talk to us!

portfolio

publications

[Re] End-to-End Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Sean McLeish and Long Tran-Thanh
Published in ReScience Volume 9 Issue 2, Joural to Conference Track NeurIPS, 2023

In this report, we aim to validate the claims of Bansal et al. These are that the recurrent architecture presented, with skip connections and a progressive loss function, prevent the original problem being forgotten or corrupted during processing allowing for the recurrent module to be applied indefinitely and that this architecture avoids the overthinking trap. We use both code released by the authors and newly developed to recreate many results presented in the paper. Additionally, we present analysis of the newly introduced alpha hyperparameter and investigate interesting perturbation behaviour of prefix sums models. Further, we conduct a hyperparameter search and provide an analysis of the Asymptotic Alignment scores of the models presented.

Citation: Sean McLeish and Long Tran-Thanh, McLeish (2023). "[Re] End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking." ReScience Volume 9 Issue 2. https://openreview.net/pdf?id=WaZB4pUVTi

Benchmarking ChatGPT on Algorithmic Reasoning

Sean McLeish, Avi Schwarzschild and Tom Goldstein
Published in arXiv, 2024

We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. The benchmark requires the use of a specified classical algorithm to solve a given problem. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems. This raises new points in the discussion about learning algorithms with neural networks and how we think about what out of distribution testing looks like with web scale training data.

Citation: Sean McLeish, Avi Schwarzschild and Tom Goldstein, McLeish (2024). "Benchmarking ChatGPT on Algorithmic Reasoning." arXiv preprint arXiv:2404.03441. https://arxiv.org/abs/2404.03441

Transformers Can Do Arithmetic with the Right Embeddings

Sean McLeish*, Arpit Bansal*, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild and Tom Goldstein
Published in NeurIPS (2024), 2024

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

Citation: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild and Tom Goldstein, McLeish (2024). "Transformers Can Do Arithmetic with the Right Embeddings." arXiv preprint arXiv:2405.17399. https://arxiv.org/abs/2405.17399

The CLRS-Text Algorithmic Reasoning Language Benchmark

Larisa Markeeva*, Sean McLeish*, Borja Ibarz*, Wilfried Bounsi, Olga Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild and Petar Veličković
Published in arXiv, 2024

Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress. Three years ago, a similar issue was identified and rectified in the field of neural algorithmic reasoning, with the advent of the CLRS benchmark. CLRS is a dataset generator comprising graph execution traces of classical algorithms from the Introduction to Algorithms textbook. Inspired by this, we propose CLRS-Text – a textual version of these algorithmic traces. Out of the box, CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks across any desirable input distribution, while offering a standard pipeline in which any additional algorithmic tasks may be created in the benchmark. We fine-tune and evaluate various LMs as generalist executors on this benchmark, validating prior work and revealing a novel, interesting challenge for the LM reasoning community. Our code is available at https://github.com/google-deepmind/clrs/tree/master/clrs/_src/clrs_text.

Citation: Larisa Markeeva, Sean McLeish, Borja Ibarz, Wilfried Bounsi, Olga Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild and Petar Veličković, Markeeva (2024). "The CLRS-Text Algorithmic Reasoning Language Benchmark." arXiv preprint arXiv:2406.04229. https://arxiv.org/abs/2406.04229

talks

teaching

CMSC 250 Discrete Structures TA

Undergraduate Class, University of Maryland, Computer Science, 2023

Led discussion classes, office hours and completed grading for 38 undergraduate students during the semester.