Which lab's AI will be the first to score over 10% on FrontierMath benchmark?

1kṀ7112

resolved Dec 22

100%96%

OpenAI

0.3%

Google

0.0%

Microsoft

0.1%

Meta AI

Anthropic

0.1%

xAI

0.0%Other

Which lab's AI will be the first to score over 10% on FrontierMath benchmark?

Resolution Criteria: Official announcement from Epoch AI or the achieving lab.

Technology

Technical AI Timelines

OpenAI

IMO Grand Challenge

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ378
2		Ṁ165
3		Ṁ140
4		Ṁ121
5		Ṁ79

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

54% chance

Will any AI model achieve > 40% on Frontier Math before 2026?

33% chance

Will an AI score over 80% on FrontierMath Benchmark in 2025

4% chance

What will be the top 3 AI labs in 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

4% chance

Will a Chinese-made AI beat o3's December score on Frontier Math by the end of 2025?

12% chance

By when will AI score >= 80% on FrontierMath

Which AI company first solves FrontierMath 85%?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

34% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Sort by:

bought Ṁ282 YES

@NeuralBets I think this can resolve. OpenAI reached 25%: OpenAI announces new o3 models | TechCrunch.

Isn't AlphaProof much closer than anything Open AI has come up with?

Open AI was testing their o1 model on AIME problems, and AlphaProof is already close to gold level on IMO problems, which I understand are vastly harder.

@TimothyJohnson5c16 The problem with AlphaProof is the need for formalization. Concepts required for high-school level math have been formalized in proof systems like Lean, these are enough for IMO, but many advanced concepts are still missing which might be required to solve problems in FrontierMath.

Another aspect of AlphaProof that I don't see people mentioning is that it's extremely slow, which makes sense because the state space of math proofs is much larger than the state spaces of Go or Chess. It took 3 days for it to solve IMO problems, a competition of 9 hours. Mathematicians say it might take them a week to solve a single problem from FrontierMath. You can see the issue here.

Personally, I think that explicit tree search might not be the way to go to scale test-time-compute, I'm more optimistic about o1-style approach of CoT + RL.

@NeuralBets I see your point about speed, but o1 also needed exponentially increasing amounts of compute to solve harder AIME problems.

Open AI didn't label the x-axis in this chart, but I suspect part of the reason they still haven't released the full version of o1 is because it's currently too expensive for practical use cases: