OpenAI has announced a model named o3. What will be the score of this model on FrontierMath?
Resolution is based on the score OpenAI publicly claims for o3 after its release. If there are multiple scores (e.g. for various levels of inference-time compute), the highest one will be used. Tool usage, including running Python and accessing the web, is allowed.
If OpenAI makes no claims about o3's score within two weeks of release, I'll use my best judgment.
I will trade on this market.
Note: There have been prior claims about o3 achieving a score of 25.2% on FrontierMath. However, note that this market is concerned about claims made in association with the public deployment of (a possibly further refined version of) o3; it's plausible that these scores are much higher, and hence a market on this is of interest. The prior 25.2% claim is irrelevant for the resolution of this market.
Note: EpochAI has a holdout subset of the FrontierMath benchmark. This is not within the scope of this market. That is, if both OpenAI and EpochAI announce scores for o3, I will resolve based on the OpenAI score.
For reference, if this market had been about o3-mini rather than o3, this market would have resolved 32%, based on the information in OpenAI's blog post.