Will GPT-4 be able to consistently solve college-level discrete math problems?
48
611
970
resolved May 16
Resolved
NO

I will use the class CS 70 at UC Berkeley to test this (https://www.eecs70.org/), once I get access to GPT-4. I will take all text-only homework problems from the most recent iteration of the class, and copy and paste them in, maybe with a minimal amount of prompt engineering (e.g. "pretend you are a brilliant mathematician" or something). Even if GPT-4 has image recognition abilities, I won't use problems with images.

After this, I will grade GPT-4's responses. I am currently a TA for the class, and I was a grader for the class for two semesters, so the grading will be as close to real life as it can be. If GPT-4 scores above 73%, I will resolve this market positively. (If a student in CS 70 scores above 73% on a homework, they get 100% on it.) Otherwise, I will resolve this negatively.

If GPT-4 releases under a different name, I'll test that model.

Note: ChatGPT and Bing Chat both cannot do this, they produce good-looking answers but consistently make incorrect statements like "9 is prime" or something.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ176
2Ṁ70
3Ṁ68
4Ṁ50
5Ṁ46
Sort by:

Resolves to no (around 65%, so not too far off)

predicted NO

@dominic had you had a chance to check? If not, could you provide an estimate?

predicted NO

@dominic any update?

bought Ṁ10 of NO

@ValeryCherepanov I'm not Dominic but I got GPT4 and tried it out on some CS 70 questions; it did not meet the 73% threshold for my sample, so I doubt it will do better whenever GPT4 improves/he tests this in the future.

The only way this resolves Yes is if, by chance, GPT4 is asked questions it was trained on (questions that were already on the internet).

@qumeric working on it, have graded the first 3 homeworks so far

I can't get it to correctly solve slightly convoluted versions of "where do the trains cross" problems, so I highly doubt it.

gpt 4 limitations are making this take a little longer to resolve, plan to try to resolve soon though

predicted NO

Given how Bing's Chatbot is now confirmed as GPT-4 (https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4) and that Bing's ChatBot can't do CS70 problems consistently, this should be resolved as no. Though you could buy ChatGPT pro to test if that will be fine-tuned for math more.

@RahulShah Yeah I have chatGPT pro, I’ll test the “official” GPT-4 on it in the next few days