Convince me this paper isn't sketchy? Exploring the MIT Mathematics and EECS Curriculum with LLMs [200 LIM]
6
150Ṁ10k
resolved Jun 23
Resolved
NO

YES = convinced not sketchy, otherwise resolves NO in a week

https://arxiv.org/abs/2306.08997

afaik GPT4 is not smart enough to solve every MIT undergrad math and CS problem, not even close, so initially skeptical

then saw sketchiness described in tweet: https://twitter.com/yoavgo/status/1669760558436872193

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ29
2Ṁ17
3Ṁ16
4Ṁ14
5Ṁ12
Sort by:

Can be resolved NO @jacksonpolack

~~seeing as we all hold NO, anyone mind an early resolution?~~ actually, the description says i'll wait a week, so i'll just wait it out

predictedNO

@firstuserhere "That's not all. In our analysis of the few-shot prompts, we found significant leakage and duplication in the uploaded dataset, such that full answers were being provided directly to GPT 4 within the prompt for it to parrot out as its own."

predictedNO

I read through it - they really do claim that GPT-4 achieves a perfect score on all their undergrad math and CS problems (with prompt engineering).

Their prompt engineering section is quite fishy - they seem to be using GPT to generate answers, grading those answers with GPT, then iterating after having GPT modify the prompt to be better rated by GPT.

They are sparse on specific details. I'd say it is unlikely they are lying that per their grading system GPT-4 gets a perfect score, but likely their method of determining score is different than you or I would typically assume.

Yeah I don't think they're lying, just that their system gives GPT second and third and so on chances when it gets a problem wrong according to the ground truth answers. I could probably get a perfect score with that too!

© Manifold Markets, Inc.TermsPrivacy