What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?
25
1.2kṀ8564
resolved Jan 1
Resolved
YES
Transformer-based architecture
Resolved
YES
Developed by OpenAI
Resolved
YES
Part of the GPT-N family of models (GPT-5, GPT-6, and variations)
Resolved
N/A
Over 1T parameters
Resolved
NO
Developed by Google Deepmind
Resolved
NO
Part of the AlphaProof family of models (AlphaProof N and variations)
Resolved
NO
Part of the o1 family of models (o1, o2, etc. and variations)
Resolved
NO
Narrow domain of knowledge. ie Does not know random facts such as when Google was founded, or who won the 1960 presidential election.
Resolved
NO
Based on Symbolic AI (https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence)
Resolved
NO
Energy-based Model (https://en.wikipedia.org/wiki/Energy-based_model)
Resolved
NO
Developed by a non-British and non-American company
Resolved
NO
It is #1 in Elo according to Chatbot Arena Leaderboard at any time

An option resolves YES if it is true about the AI model, or program, known to be State of the Art in terms of the FrontierMath benchmark, at the end of the year 2025. It resolves NO otherwise.

You're welcome to add any interesting facts that might or might not be true about the state of the art in math problems, as defined by achieving the highest score on the FrontierMath benchmarks.

I reserve the right to cancel any option that is too vague, too improbable, etc.

See also:
/Bayesian/what-will-true-of-the-sota-ai-on-th-y0LE5uE9n9 (This market)
/Bayesian/what-will-true-of-the-sota-ai-on-th-ROldIhZZgt
/Bayesian/what-will-true-of-the-sota-ai-on-th-RQptyR5uO8

/Bayesian/will-an-ai-achieve-85-performance-o-hyPtIE98qZ
/MatthewBarnett/will-an-ai-achieve-85-performance-o

/Bayesian/will-an-ai-achieve-30-performance-o

  • Update 2025-11-06 (PST) (AI summary of creator comment): The creator is uncertain whether to use Epoch's independent evaluation (which has a weaker scaffold) or the highest reported score to determine the SOTA AI. This ambiguity in the resolution source has not yet been resolved.

  • Update 2025-11-06 (PST) (AI summary of creator comment): If there is ambiguity about which model is truly SOTA (e.g., due to different evaluation methods or scaffolds), the creator will have someone uninvolved take a guess as to which model is truly state of the art, even if apple-to-apple comparisons are not possible.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ269
2Ṁ229
3Ṁ101
4Ṁ61
5Ṁ53
Sort by:
bought Ṁ10 NO

oh no, are we basing this off epoch's independent evaluation which has a weaker scaffold or the highest reported score

@Bayesian I assumed for all of your markets we would use the highest reported score.

@TimothyJohnson5c16 thank you for sharing. hmmmm my intuition is to just assume there won't be ambiguity but if there ends up being ambiguity to have someone uninvolved take a guess as to which model is truly SOTA, even if apple to apple comparisons are not possible

© Manifold Markets, Inc.TermsPrivacy