Will any AI get a perfect score on the Putnam exam before April 2024?
31
323
510
resolved Apr 9
Resolved
NO

Using the same resolution criteria as in the Metaculus question about this.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ56
2Ṁ53
3Ṁ45
4Ṁ31
5Ṁ22
Sort by:

Can resolve NO

bought Ṁ15 of NO

I gave ChatGPT literally the easiest problem on the Putnam, and it could not be solved.

I know it's not the same, since ChatGPT is an LLM, while there are better more specialized AI out there. But the problem I gave GPT was so easy compared to other Putnam problems. It wasn't a proof-based one, but it had a definitive answer.

predicted NO

Who grades the proofs? The Putnam judges can be pretty precise about the cutoff for 9 vs. 10. A single misstatement in an otherwise correct proof can be enough to get dinged.

Like if I attempted this using ChatGPT Plus, I wouldn’t consider myself qualified to score the result.

bought Ṁ20 of NO

When you say “the” Putnam, does that mean the most recent exam, or would any historical exam suffice?

bought Ṁ100 of NO

@JimHays Perhaps we should use the exact same Metaculus criteria.

@MatthewBarnett I'd be fine with that, though it gets a little iffy if it solves an old exam that's had answers up online, since they could have gotten discussed in other forums that were in its training data.

predicted NO

From Metaculus:

“This question resolves on the date during which a computer program first clearly demonstrates the ability to receive a perfect score on the William Lowell Putnam Mathematical Competition, without cheating, and within the time limits given in the real-world competition. Cheating includes training on content that could conceivably spoil the solutions to the competition, and includes having access to external equipment normally forbidden during the competition that can be used to aid solving the problems, or advice from other mathematicians. Thus, Metaculus administrators should be careful not to resolve this question prematurely.

In the strictest case, the model should be tested on the most recent Putnam Competition, after having trained the model prior to the release of the most recent solutions. Here is an archive of Putnam Competition problems going back to 1985. Since it is generally understood that Putnam problems have become harder over time, this question will not consider any candidate program that receives a perfect score on a Putnam examination from prior to 2000 as eligible to trigger positive resolution.”

@JimHays Right, I'm saying it would be difficult to rule out that old competitions were in the training data for modern LLMs.

predicted NO

I was pasting it mostly to save a step for future viewers.

I agree you’d probably have to use a model with a known cutoff date and a modern exam to meet this

bought Ṁ10 of YES

@JimHays Alright, sounds good to me. Description edited.