
Putnam exam link. Note that it happens every December.
The AI must score in the top 100 for a particular year.
By "top 100" I mean that its score must be >= the score of the 100th place scorer. (If 100th place is a tie I'll use the tying score).
If we know the details of the training date, then the training data must all have been released prior to the release of the Putnam questions for that year. i.e. if ModelNet is run on the 2026 Putnam, it must be trained on data from before the date of the 2026 Putnam exam.
The AI does not have to be trained before the relevant exam, as long as the data predates the exam.
The scoring for the AI's exam must either be done by the actual Putnam scorers, mathematicians who have been Putnam scorers, or mathematicians who are actively involved in competitive mathematics in some way. (i.e. a professor who runs a university's competitive team counts, a software engineer who did well in the Putnam 5 years ago does not).
I may accept scoring that isn't blinded, but I reserve the right to ignore any scoring that's vaguely suspect/biased/etc.
Update 2025-12-10 (PST) (AI summary of creator comment): "Released" data means data that existed prior to the exam, not necessarily publicly available open-source data. The AI model does not need to be trained on open data, but all training data must have existed before the Putnam exam date to prevent test set contamination.
Update 2025-12-21 (PST) (AI summary of creator comment): If an AI scores in the top 100, evidence of proper scoring by qualified specialists must be provided by at most a few months into 2026. After that deadline, the market will resolve No.
Update 2026-01-12 (PST) (AI summary of creator comment): The creator is undecided on whether to accept Lean proofs that haven't been reconverted to natural language. It's also unclear whether manually written question statements versus model-generated statements will affect acceptance. The creator acknowledges that Lean proofs "morally" count but may not satisfy the original market description intent.
People are also trading
They claim to have solved all problems of the Putnam 2025: https://axiommath.ai/territory/from-seeing-why-to-checking-everything
@felixx It's unclear to me whether they wrote the question statements by hand, or a model generated them. I'm also undecided on whether I'll accept Lean proofs that haven't been reconverted to natural language - morally I think that counts, but it's not really what the market description is pointing at.
@vluzko They all do it in Lean: https://github.com/project-numina/numina-lean-agent/blob/main/NuminaLeanAgent.pdf
Nous Research claims to have scored 87/120! https://x.com/NousResearch/status/1998536543565127968
@SG For this I want other graders agreeing with the score - afaict their grading was just done by one person who got in the top 200 in the past.
@clementdupOz "Released" in the sense of "exists". It does not need to be an open data model, but it must be trained on data that existed prior to the 2025 Putnam exam. This requirement is to prevent any possibility of the test set getting into the training set.
Claim: DeepSeekMath-V2 hits gold-medal performance on Putnam. https://x.com/theturingpost/status/1994926897248288813?s=46
@SG that's not this year's putnam though right? whereas this is: https://x.com/axiommathai/status/1997767850279440715?s=20
(but we don't know rankings yet)