Will a Chinese-made AI beat o3's December score on Frontier Math by the end of 2025?
➕
Plus
54
Ṁ16k
2026
75%
chance

On December 20, 2024, OpenAI reported that their o3 reasoning model scored 25.2% on EpochAI's Frontier Math benchmark. For context, AI models like GPT-4 and Gemini score around 2%. Will a Chinese-made AI model surpass that score in 2025?

Resolution Criteria

This market will resolve YES if:

  • A Chinese company, university, or government entity reports an AI model (e.g. DeepSeek or Qwen) scoring higher than 25.2% on the Frontier Math in 2025

  • The score is publicly announced and independently verified by EpochAI

The market will resolve NO if:

  • No Chinese-developed AI model surpasses 25.2% on Frontier Math in 2025

  • It eventually comes out that a Chinese model created in 2025 surpasses 25.2% on Frontier Math, but this wasn't widely known as of the end of 2025

Other Notes

  • This market is based on o3's December score of 25.2%. If o3 later surpasses that (for instance, by re-running with more inference compute), the new score won't supersede this one

  • If there's any uncertainty as to whether a model is "Chinese-made," I'll add clarifications as I see fit. Generally, I'll consider any model whose development was primarily conducted by a Chinese entity to be "Chinese-made"

  • Models may use any architecture and any amount of compute. I'm also including models that are specifically designed for math or research, not just general LLMs

  • If Frontier Math changes their benchmark (for instance, by adding a fourth tier of problems), I'll use my best judgement for doing an apples-to-apples comparison. If it doesn't seem possible to fairly compare results, I'll resolve the market at the current price

  • The model doesn't need to be publicly available, but the score needs to be publicly announced + verified

Get
Ṁ1,000
and
S3.00
Sort by:
opened a Ṁ500 YES at 60% order

@Fay42 Would you like to take a larger NO position? I set a limit order at 55%

@AdamK man, R1's aime performance was good...

@Fay42 Curious whether you were taking a no position bc you thought the math models wouldn’t improve fast enough outside OAI, or because you thought they wouldn’t be open-sourced

Combination of both - though less about OpenAI specifically and more about American vs Chinese speeds on frontier benchmarks. I still think Deepseek is in a plausibly bad spot with the new export restrictions but there's a substantial lag between export restrictions + the time at which those export restrictions impact models (since it takes time to get, install, and use gpus).

@Fay42 I think it's very likely that the compute difference requirements between o1 and o3 were small enough that DeepSeek could probably beat o3 on FrontierMath this year with literally no additional compute. (In principle by capabilities, but if the model is open-sourced, I see no reason why Epoch shouldn't test it)

@AdamK It's plausible doing the o3 eval cost hundreds of thousands of dollars, in which case Epoch would need to be willing to spend a lot on doing the FrontierMath eval themselves. I agree that it's plausible deepseek has enough compute to make an o3 equivalent already.

@Fay42 Sure, but the o-series RL paradigm is nowhere close to being scaled. I'm willing to bet that both OAI and DeepSeek will be spending 1-2 OOMs more compute than o3 on RL for individual models by the end of the year. The next reasoning model DeepSeek makes might be comparable to o3 with heavy inference, but the one after won't need nearly as much.

@AdamK I'd bet against Deeepseek doing 1-2 OoMs more than o3 within a year, but idk how to resolve such a bet. And note that they have to spend the compute, train the model, and then have it's inference be possibly an OoM cheaper for the same o3 level results. Though, there are a bunch of other possible paths to a Yes resolution on this market so idk.

@Fay42 I'm also not sure how to resolve. I do think you're either/both underestimating how much compute DeepSeek has/will have, and/or how little RL compute it likely took to make o3

@TamayBesiroglu @ElliotGlazer Would be curious to hear if you have a policy (in mind or publicly stated somewhere) for which models will be evaluated on Frontier Math? It might be nice to commit to evaluating e.g. the apparent SotA open-source LLM on a quarterly basis.

bought Ṁ650 NO

Sorry, who says that EpochAI will even share their problems with Chinese AI companies? Trump is about to be President. China-US relations are probably not good and will likely get worse. People are concerned about fraud and such. Epoch might not trust China to leak the problems.

opened a Ṁ250 YES at 46% order

@nathanwei I think agree that this is the most plausible path to a NO resolution. I do think there is a very high chance that a Chinese AI will exist before 2026 that is in principle capable of beating o3's score; the main question is how they would interface with Epoch

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules