Convince me this paper isn't sketchy? Exploring the MIT Mathematics and EECS Curriculum with LLMs [200 LIM]

150Ṁ10k

resolved Jun 23

Resolved

ALL

YES = convinced not sketchy, otherwise resolves NO in a week

afaik GPT4 is not smart enough to solve every MIT undergrad math and CS problem, not even close, so initially skeptical

then saw sketchiness described in tweet: https://twitter.com/yoavgo/status/1669760558436872193

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ29
2		Ṁ17
3		Ṁ16
4		Ṁ14
5		Ṁ12

People are also trading

Will LLMs be banned at the 2026 MIT Mystery Hunt?

25% chance

Is Nick Cammarata right: LLM will be able to mechanistically audit own circuits and explain ghiblification in 2Y?

Sort by:

Can be resolved NO @jacksonpolack

~~seeing as we all hold NO, anyone mind an early resolution?~~ actually, the description says i'll wait a week, so i'll just wait it out

predictedNO

@firstuserhere "That's not all. In our analysis of the few-shot prompts, we found significant leakage and duplication in the uploaded dataset, such that full answers were being provided directly to GPT 4 within the prompt for it to parrot out as its own."

predictedNO

Notion doc

I read through it - they really do claim that GPT-4 achieves a perfect score on all their undergrad math and CS problems (with prompt engineering).

Their prompt engineering section is quite fishy - they seem to be using GPT to generate answers, grading those answers with GPT, then iterating after having GPT modify the prompt to be better rated by GPT.

They are sparse on specific details. I'd say it is unlikely they are lying that per their grading system GPT-4 gets a perfect score, but likely their method of determining score is different than you or I would typically assume.

Yeah I don't think they're lying, just that their system gives GPT second and third and so on chances when it gets a problem wrong according to the ground truth answers. I could probably get a perfect score with that too!

People are also trading

Will LLMs be banned at the 2026 MIT Mystery Hunt?

+7% 1d25% chance

Is Nick Cammarata right: LLM will be able to mechanistically audit own circuits and explain ghiblification in 2Y?

23% chance

🏅 Top traders

People are also trading

People are also trading

Related questions