Is ChatGPT a Generalist Algorithmic Learner?

1 minute read

Published: December 07, 2023

Sean McLeish, Avi Schwarzschild, Tom Goldstein

All benchmark code is available here: CLRS4LM GitHub.

This is an extension of our arXiv paper, available here: arXiv. Here we also present results on the CLRS size 16 training data and provide more discussion.

We are all at NeurIPS 2023, come talk to us!

Individual Accuracy

We do not necessarily see a pattern between the testing and training accuracies. This could be because ChatGPT mostly relies on Python code in these problems, so the difference between test and train data is less noticable than in the GNN case. As any Python code which can correctly answer a train problem can also answer a test problem and vice versa.

Individual Train 0-1 Accuracy Scores for ChatGPT

Individual Test 0-1 Accuracy Scores for ChatGPT

Individual F1

Again, we do not necessarily see a pattern between the testing and training F1 scores.

Comparison F1

Here we compare the test F1 scores for ChatGPT and other GNN models on the CLRS benchmark problems.

Thanks for reading, please look at the GitHub and cite out arXiv Paper.

Geometry problems - Jarvis' march
System Prompt:
You are a helpful assistant for solving and explaining classical coding problems.
Context:
Perform the Jarvis March Algorithm on these points, X coordinates [1.2194, -1.11406, 0.38929, -1.73849, -0.31843, 1.22709, 0.43665, 0.7779, -1.62778, -0.26118, -0.24323, -0.66371, 0.81454, -1.17166, -0.03785, 1.07014], Y coordinates [1.498, -1.25286, 0.34116, 0.53362, -0.23869, 0.35766, -1.86391, 0.53266, -0.29587, 1.28856, -1.34246, -1.10064, 1.74479, -0.59935, 0.48395, 1.55081], return the indices of the points in the hull, sorting these indices in ascending order when printing, indexing from 0. If you write python code, the first code block should only be you defining the two arrays. I cannot run code, you should show as much working as possible, at least the first step of working by hand, and run until the process is complete. The last line of your output should be the solution to the problem, if this is from running code, you should restate the output in our conversation.

Share on

Twitter Facebook LinkedIn