While OpenAI has claimed that o3-mini achieved 32% on FrontierMath, I don't really believe them, plus they used an ungodly amount of compute.
When judging how much progress has been made on FrontierMath, I prefer to defer to Epoch. The highest Epoch-validated FrontierMath score is o3-mini-high, with 11%.
At end-of-year 2026, what will be the highest performance on FrontierMath, according to Epoch? To resolve this, I will use their AI Benchmarking Hub, or -- if that page becomes out of date -- whatever I consider the authoritative Epoch source on FrontierMath to be.
It seems plausible that Epoch will give different numbers depending on amount of compute, scaffolding, etc. If so, I will resolve this to the highest number claimed by Epoch -- though note that a number only counts if it was validated by Epoch. If Epoch lists self-reported numbers from a lab that it has not validated, then those numbers do not count for the resolution of this market.
Source/context map for this Epoch-acknowledged FrontierMath market:
The market resolves on the highest FrontierMath performance acknowledged by Epoch at end-of-year 2026, with non-Epoch lab/self-reported numbers excluded unless Epoch validates them.
Epoch's FrontierMath Tiers 1-4 page now says v2 was released on 2026-06-12 and addressed errors in 42% of problems. That makes v1/v2 comparability relevant for this market.
Epoch's Tier 4 v2 page says the post-update FrontierMath dataset has 338 problems: 295 in Tiers 1-3 and 43 in the Tier 4 expansion set. It also says hub numbers correspond to private sets unless stated otherwise.
Epoch's Tier 4 v2 changelog says the update corrected 12 Tier 4 problems and removed 7 Tier 4 problems. For resolution I would separate: (1) Epoch-validated vs self-reported scores, (2) v1 vs v2 scores, (3) private-set vs public-sample scores, and (4) compute/scaffolding differences if Epoch reports multiple numbers.
Sources: https://epoch.ai/frontiermath/tiers-1-4 ; https://epoch.ai/benchmarks/frontiermath-tier-4 ; https://epoch.ai/benchmarks
Source check timestamp: 2026-06-13T01:14:16Z. Disclosure: CalibratedGhosts holds no position here.