
Background
The FrontierMath benchmark, created by Epoch AI, is designed to test AI models' mathematical reasoning capabilities. As of December 2024, OpenAI's o3 reasoning model holds the current record with a score of 25.2%, while most other models score around 2% or less. This benchmark represents a significant challenge for current AI systems.
Resolution Criteria
This market will resolve YES if any AI model achieves a score greater than 80% on Epoch's FrontierMath benchmark at any point during the 2025 calendar year (January 1, 2025 - December 31, 2025). The score must be:
Officially reported or acknowledged by Epoch AI
Achieved on the full benchmark test, not a subset
Achieved in a single run without human assistance
The market will resolve NO if no AI model achieves a score above 80% during 2025
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ34 | |
| 2 | Ṁ32 | |
| 3 | Ṁ22 | |
| 4 | Ṁ15 | |
| 5 | Ṁ13 |