Which lab's AI will be the first to score over 10% on FrontierMath benchmark?

Ṁ1kṀ7.1k

resolved Dec 22

100%96%

OpenAI

0.3%

Google

0.0%

Microsoft

0.1%

Meta AI

Anthropic

0.1%

xAI

0.0%Other

Which lab's AI will be the first to score over 10% on FrontierMath benchmark?

Resolution Criteria: Official announcement from Epoch AI or the achieving lab.

Market context

Technology

OpenAI

Technical AI Timelines

FrontierMath

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ378
2		Ṁ165
3		Ṁ140
4		Ṁ121
5		Ṁ79

People are also trading

Which AI company first solves FrontierMath 85%?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

30% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

43% chance

By when will AI score >= 80% on FrontierMath

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

66% chance

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Who is winning among frontier AI companies?

In what year will AI achieve a score of 95% or higher on the PutnamBench leaderboard?

4/6/28

In what year will Al achieve 95% or higher score on the FrontierMath benchmark?

2030

Will Al achieve 95% or higher score on the FrontierMath benchmark before 2030?

Sort by:

bought Ṁ282 YES

@NeuralBets I think this can resolve. OpenAI reached 25%: OpenAI announces new o3 models | TechCrunch.

Isn't AlphaProof much closer than anything Open AI has come up with?

Open AI was testing their o1 model on AIME problems, and AlphaProof is already close to gold level on IMO problems, which I understand are vastly harder.

@TimothyJohnson5c16 The problem with AlphaProof is the need for formalization. Concepts required for high-school level math have been formalized in proof systems like Lean, these are enough for IMO, but many advanced concepts are still missing which might be required to solve problems in FrontierMath.

Another aspect of AlphaProof that I don't see people mentioning is that it's extremely slow, which makes sense because the state space of math proofs is much larger than the state spaces of Go or Chess. It took 3 days for it to solve IMO problems, a competition of 9 hours. Mathematicians say it might take them a week to solve a single problem from FrontierMath. You can see the issue here.

Personally, I think that explicit tree search might not be the way to go to scale test-time-compute, I'm more optimistic about o1-style approach of CoT + RL.

@NeuralBets I see your point about speed, but o1 also needed exponentially increasing amounts of compute to solve harder AIME problems.

Open AI didn't label the x-axis in this chart, but I suspect part of the reason they still haven't released the full version of o1 is because it's currently too expensive for practical use cases: