On December 20, 2024, OpenAI reported that their o3 reasoning model scored 25.2% on EpochAI's Frontier Math benchmark. For context, AI models like GPT-4 and Gemini score around 2%. Will a Chinese-made AI model surpass that score in 2025?
Resolution Criteria
This market will resolve YES if:
A Chinese company, university, or government entity reports an AI model (e.g. DeepSeek or Qwen) scoring higher than 25.2% on the Frontier Math in 2025
The score is publicly announced and independently verified by EpochAI
The market will resolve NO if:
No Chinese-developed AI model surpasses 25.2% on Frontier Math in 2025
It eventually comes out that a Chinese model created in 2025 surpasses 25.2% on Frontier Math, but this wasn't widely known as of the end of 2025
Other Notes
This market is based on o3's December score of 25.2%. If o3 later surpasses that (for instance, by re-running with more inference compute), the new score won't supersede this one
If there's any uncertainty as to whether a model is "Chinese-made," I'll add clarifications as I see fit. Generally, I'll consider any model whose development was primarily conducted by a Chinese entity to be "Chinese-made"
Models may use any architecture and any amount of compute. I'm also including models that are specifically designed for math or research, not just general LLMs
If Frontier Math changes their benchmark (for instance, by adding a fourth tier of problems), I'll use my best judgement for doing an apples-to-apples comparison. If it doesn't seem possible to fairly compare results, I'll resolve the market at the current price
The model doesn't need to be publicly available, but the score needs to be publicly announced + verified