After looking into it, will I believe that LLMs perform badly on USAMO 2025?
5
100Ṁ95
2026
61%
chance

This paper was recently posted to the ArXiV: https://arxiv.org/abs/2503.21934

It claims that SOTA LLMs achieved surprisingly low scores on this year's USAMO, achieving less that 5% on average.

In some discussions of this paper I've seen AI defenders claim that the paper is fake.

This weekend I will look into the paper's methodology, try to recreate their results with the models I have access to (Deepseek, o3 mini, Claude 3.7 thinking) if things are still unclear, and determine whether I think the paper's results are substantially true.

Possible resolutions are:

  • 100%, the paper's results seem basically correct.

  • 80%, the main thrust is correct but it seems like models performed particularly badly in their tests or they graded unnecessarily harshly.

  • 50%, I am more confused than I am now and don't form an internal consensus.

  • 20%, the paper's results are substantially, but not wholly, incorrect in my view.

  • 0%, this seems like a fake paper to me/is completely wrong

My credentials and current epistemic status:

  • Former USAMO competitor and current PhD student in math.

  • Significant AI skeptic compared to most of Manifold, but probably not compared to general population.

  • The results of this paper were surprising to me, I would have expected much better performance.

Since the resolution criteria are subjective, I will not trade in this market.

Get
Ṁ1,000
to start trading!
Sort by:

How do you plan to avoid contamination when attempting to recreate the results?

@zsig This is a bit tricky and I'm not very confident in my ability to do it. Current plan is just to compare my results to the results from the paper and note the major discrepancies, then look through published solutions to see if those discrepancies are plausibly a result of recent training data. In principle unless one of these models updates their knowledge cutoff this shouldn't be an issue, but who knows what's really going on behind the scenes?

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules