
I will use the class CS 70 at UC Berkeley to test this (https://www.eecs70.org/), once I get access to GPT-4. I will take all text-only homework problems from the most recent iteration of the class, and copy and paste them in, maybe with a minimal amount of prompt engineering (e.g. "pretend you are a brilliant mathematician" or something). Even if GPT-4 has image recognition abilities, I won't use problems with images.
After this, I will grade GPT-4's responses. I am currently a TA for the class, and I was a grader for the class for two semesters, so the grading will be as close to real life as it can be. If GPT-4 scores above 73%, I will resolve this market positively. (If a student in CS 70 scores above 73% on a homework, they get 100% on it.) Otherwise, I will resolve this negatively.
If GPT-4 releases under a different name, I'll test that model.
Note: ChatGPT and Bing Chat both cannot do this, they produce good-looking answers but consistently make incorrect statements like "9 is prime" or something.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ176 | |
2 | Ṁ70 | |
3 | Ṁ68 | |
4 | Ṁ50 | |
5 | Ṁ46 |
People are also trading
@ValeryCherepanov I'm not Dominic but I got GPT4 and tried it out on some CS 70 questions; it did not meet the 73% threshold for my sample, so I doubt it will do better whenever GPT4 improves/he tests this in the future.
The only way this resolves Yes is if, by chance, GPT4 is asked questions it was trained on (questions that were already on the internet).
Given how Bing's Chatbot is now confirmed as GPT-4 (https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4) and that Bing's ChatBot can't do CS70 problems consistently, this should be resolved as no. Though you could buy ChatGPT pro to test if that will be fine-tuned for math more.