Will any AI reach 20%+ performance on FrontierMath by December 31st 2026?
42
1kṀ270k
resolved Jan 1
Resolved
YES

The best performance by an AI system on FrontierMath as of December 31st 2026.

Which AI systems count?

Any AI system counts if it operates within realistic deployment constraints and doesn't have unfair advantages over human baseliners.

Tool assistance, scaffolding, and any other inference-time elicitation techniques are permitted as long as:

  • There is no systematic unfair advantage over the humans described in the Human Performance section (e.g. AI systems are allowed to have multiple outputs autograded while humans aren't, or AI systems have access to the internet when humans don't).

  • Having the AI system complete the task does not use more compute than could be purchased with the wages needed to pay a human to complete the same task to the same level

The PASS@k elicitation technique (which automatically grades and chooses the best out of k outputs from a model) is a common example that we do not accept on this benchmark because mathematicians are generally evaluated on their ability to generate a single correct answer, not multiple answers to be automatically graded. So PASS@k would consititute an unfair advantage.

If there is evidence of training contamination leading to substantially increased performance, scores will be accordingly adjusted or disqualified.

(Much of the resolution is modified from AI Digest's excellent
/Manifold/what-will-be-the-best-performance-o-A58Ld8LZZL )

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ138,162
2Ṁ14,000
3Ṁ1,114
4Ṁ666
5Ṁ315
© Manifold Markets, Inc.TermsPrivacy