Will any AI model achieve > 40% on Frontier Math before 2026?
104
1kαΉ€83k
resolved Dec 13
Resolved
YES

The model need not be released

  • Update 2025-09-19 (PST) (AI summary of creator comment): - Resolution will be based on Epoch's reported Frontier Math scores. Other sources (e.g., AI Digest or lab-only reports) will not determine resolution.

Market context
Get
αΉ€1,000
to start trading!

πŸ… Top traders

#NameTotal profit
1αΉ€5,907
2αΉ€3,631
3αΉ€2,005
4αΉ€1,614
5αΉ€1,042
Sort by:
sold αΉ€207 NO

Not verified/posted on official page but GPT-5.2 high is showing 40.3 here https://epoch.ai/benchmarks/use-this-data

@TimDuffy Yeah, I see it on the official dashboard now.

bought αΉ€2,839 YES

@JaundicedBaboon Resolves YES.

bought αΉ€500 NO

https://epoch.ai/frontiermath Epoch

just posted evals and 5.2 only got 26.6%. Will leave unresolved for now in case that was the non-thinking version or the results are amended. It seems shockingly low

bought αΉ€150 NO

@JaundicedBaboon I'd wait to resolve since there's some small chance Epoch will evaluate Gemini 3 Deep Think, they haven't yet and I bit it would exceed 40 if they did. I'm also surprised at the low score!

bought αΉ€100 NO

The 26% is for 5.2 low, high could be much higher actually!

sold αΉ€106 NO

5.1 scored: 17.3% low, 26.9% med, 31.0% high.
If 5.2 has the same low/high gap, it will be right at 40.

@TimDuffy plus they will test it at extra high not high since thats new for 5.2-thinking

bought αΉ€25 NO

This will likely resolve yes but note that this market is based on Epoch's evaluation, I think the 40.3 we've seen is OpenAI's.

bought αΉ€100 NO

Previously OpenAI evaluated o3 and scored 25.3, Epoch evaluated it and scored it 18.7.

bought αΉ€75 NO

IIRC Epoch hasn't evaluated Gemini 3 Deep Think though, if they do before EOY I think that model is likely to exceed 40%.

bought αΉ€949 YES

Well fuck me πŸ˜…

bought αΉ€200 YES

Epoch reported long ago that Agent 1 scored 49% at original FrontierMath (now tier 1-3) with pass@16.

https://x.com/EpochAIResearch/status/1945905802998423867

Does this count?

@qumeric Pass@16 should definitely not count... If it did, why not pass@32 or pass@64? It's clear that this market is about pass@1.

Why is this so different from this market? Are both based on FrontierMath Tiers 1-3? https://manifold.markets/SG/top-frontiermath-score-in-2025

Resolution will be based on Epoch's reported Frontier Math scores.

Historically openai reported 32% for o3-mini with python (which counts for the purpose of that other market afaict), but Epoch testing it with the general / minimal scaffold got 11.03%. Likely isn't because OpenAI is making up numbers or whatever but they demonstrably have a different setup

@JaundicedBaboon does this resolve according to AI Digest (which includes e.g. lab-reported scores) or according to Epoch’s evaluation?

@bh I’ll go by what Epoch reports

opened a αΉ€500 NO at 45% order

@Bayesian Limit up at 45% ;)

@BrunoJ i can uh... get a better price if i wait... 😭

opened a αΉ€3,000 YES at 51% order

All it would take is running the IMO model on Frontier Math.

bought αΉ€500 NO

@VinceVatter FrontierMath is orders of magnitude harder than IMO.

@traders 116 days until 2026! is a breakthrough expected over the next 4 months? Given the size of the jump from GPT-4 to GPT-5, I'm not sure why this is at 55%. I'm going to keep buying a little bit more NO every day.

boughtαΉ€250NO

@BrunoJ Limit up at 50%

@Bayesian I'm a little bit overexposed on this one πŸ˜…

Β© Manifold Markets, Inc.β€’Termsβ€’Privacy