After looking into it, will I believe that LLMs perform badly on USAMO 2025?
5
100Ṁ95
2026
61%
chance

This paper was recently posted to the ArXiV: https://arxiv.org/abs/2503.21934

It claims that SOTA LLMs achieved surprisingly low scores on this year's USAMO, achieving less that 5% on average.

In some discussions of this paper I've seen AI defenders claim that the paper is fake.

This weekend I will look into the paper's methodology, try to recreate their results with the models I have access to (Deepseek, o3 mini, Claude 3.7 thinking) if things are still unclear, and determine whether I think the paper's results are substantially true.

Possible resolutions are:

  • 100%, the paper's results seem basically correct.

  • 80%, the main thrust is correct but it seems like models performed particularly badly in their tests or they graded unnecessarily harshly.

  • 50%, I am more confused than I am now and don't form an internal consensus.

  • 20%, the paper's results are substantially, but not wholly, incorrect in my view.

  • 0%, this seems like a fake paper to me/is completely wrong

My credentials and current epistemic status:

  • Former USAMO competitor and current PhD student in math.

  • Significant AI skeptic compared to most of Manifold, but probably not compared to general population.

  • The results of this paper were surprising to me, I would have expected much better performance.

Since the resolution criteria are subjective, I will not trade in this market.

Get
Ṁ1,000
to start trading!
Sort by:

How do you plan to avoid contamination when attempting to recreate the results?

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules