Will an AI score over 10% on FrontierMath Benchmark in 2025

Question

"Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%. 
Existing math benchmarks like GSM8K and MATH are approaching saturation, with AI models scoring over 90%—partly due to data contamination. FrontierMath significantly raises the bar. Our problems often require hours or even days of effort from expert mathematicians.
We evaluated six leading models, including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro. Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks."

Manifold Markets · Accepted Answer

Yes — resolved on Dec 20, 2024 by Manifold Markets prediction market.

#	Trader	Total profit
1		Ṁ349
2		Ṁ261
3		Ṁ185
4		Ṁ150
5		Ṁ84

🏅 Top traders

People are also trading

Related questions