Background
FrontierMath is a challenging mathematical benchmark created by Epoch AI to evaluate the mathematical reasoning capabilities of AI models. It consists of competition-level mathematics problems from various sources including the International Mathematical Olympiad (IMO) and the Putnam Competition. These problems require advanced problem-solving skills, creativity, and mathematical intuition.
As of December 2024, the highest score achieved on the FrontierMath benchmark is 25.2% by OpenAI's o3 reasoning model. This benchmark is considered particularly difficult for AI systems, as it tests deep mathematical reasoning rather than pattern recognition or memorization.
Resolution Criteria
This market will resolve to the highest publicly reported score (as a percentage) achieved by any AI model on the FrontierMath benchmark during the 2025 calendar year (January 1, 2025 to December 31, 2025).
The score must be accepted a rough consensus of AI experts.
The AI system must use realistic deployment constraints (nothing AI experts would think is "cheating" or not in the spirit of the benchmark).
AI systems that run for a very long time (e.g. a week or more) are fair game.
The score must be on the standard FrontierMath benchmark as defined by Epoch AI (not on a modified or partial version)
If multiple models achieve different scores during 2025, the highest score will be used for resolution
If no new scores are reported during 2025, the market will resolve to the last known score before 2025 (currently 25.2%)
If the FrontierMath benchmark is significantly modified during 2025, resolution will be based on the most comparable version to the current benchmark