Will an AI score over 30% on FrontierMath Benchmark in 2025

160Ṁ3543

resolved Feb 20

Resolved

YES

ALL

"Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.
Existing math benchmarks like GSM8K and MATH are approaching saturation, with AI models scoring over 90%—partly due to data contamination. FrontierMath significantly raises the bar. Our problems often require hours or even days of effort from expert mathematicians.
We evaluated six leading models, including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro. Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks."

Technology

Technical AI Timelines

OpenAI

IMO Grand Challenge

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1,060
2		Ṁ460
3		Ṁ390
4		Ṁ195
5		Ṁ139

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

54% chance

Will an AI score over 80% on FrontierMath Benchmark in 2025

4% chance

Will any AI model achieve > 40% on Frontier Math before 2026?

33% chance

What will be the best performance on FrontierMath by December 31st 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

4% chance

Will a Chinese-made AI beat o3's December score on Frontier Math by the end of 2025?

12% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

34% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

41% chance

Top FrontierMath score in 2025?

46.8

By when will AI score >= 80% on FrontierMath

4 Comments

21 Holders

85 Trades

Sort by:

opened a Ṁ300 YES at 99.0% order

@sponge They reached 32% with o3 mini using a Python tool, so this can resolve YES.
OpenAI o3-mini | OpenAI

@sponge Any reason not to resolve this yet?

@mods I believe this should be resolved.

I'm confused by OpenAI's claims vs the original FrontierMath paper. They are claiming 5.8% (https://openai.com/index/openai-o3-mini/) on o1 mini pass@1 while frontiermath paper had it at under 2%. Pass@8 is nearly 13%, implying this test is much "easier" than the original paper claims.

Are these evaluations being done consistently?