Will an AI score over 80% on FrontierMath Benchmark in 2025
62
1kṀ80k
Dec 31
35%
chance

"Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.
Existing math benchmarks like GSM8K and MATH are approaching saturation, with AI models scoring over 90%—partly due to data contamination. FrontierMath significantly raises the bar. Our problems often require hours or even days of effort from expert mathematicians.
We evaluated six leading models, including Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro. Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks."

Get
Ṁ1,000
to start trading!
Sort by:

fk it i'll buy more at 35%

@Bayesian Did something change your mind? It seems like Grok 3 isn't quite at the level of o3 on math and coding benchmarks, even with reasoning enabled.

@TimothyJohnson5c16 polymarket has a market that has given me confidence that I am right

(90% is much harder than 80% bc there's around a 10% error rate, ie ~10% of problems can't be solved correctly according to the benchmark)

@Bayesian Yeah, if I were on Polymarket, I would just call that free money.

@Bayesian Though of course, 10% error rate is also an estimate, right? In the world where an AI model reaches 90%, the error rate is probably a lot lower.

@TimothyJohnson5c16 yeah 10% error is an estimate, the errors they found were like 6% iirc, and they estimated # of errors they would have missed

@Bayesian that a big deal if there's a 10% error rate though right?

@NebulaByte I don’t understand your comment could you rephrase

opened a Ṁ2,000 NO at 35% order

@Bayesian want to buy more? I put a limit order at 35%.

opened a Ṁ1,000 NO at 35% order

Lol, Acceleration took most of it, so I added some more.

opened a Ṁ5,000 YES at 36% order

@TimothyJohnson5c16 I was scared but i change my mind ill buy more. YES order at 36%

opened a Ṁ10,000 NO at 40% order

@Bayesian if you still want more I put a limit order at 40

opened a Ṁ5,000 NO at 35% order

@Bayesian I think yours was taken already? I put more NO at 35%.

opened a Ṁ1,000 NO at 20% order
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules