Background
The FrontierMath benchmark, created by Epoch AI, is designed to test AI models' mathematical reasoning capabilities. As of December 2024, OpenAI's o3 reasoning model holds the current record with a score of 25.2%, while most other models score around 2% or less. This benchmark represents a significant challenge for current AI systems.
Resolution Criteria
This market will resolve YES if any AI model achieves a score greater than 80% on Epoch's FrontierMath benchmark at any point during the 2025 calendar year (January 1, 2025 - December 31, 2025). The score must be:
Officially reported or acknowledged by Epoch AI
Achieved on the full benchmark test, not a subset
Achieved in a single run without human assistance
The market will resolve NO if no AI model achieves a score above 80% during 2025