Will there be a reasoning model more powerful than o1-preview, and cheaper and >10x faster than o1-mini, by Nov 12 2025?
35
103Ṁ3177
resolved Nov 19
Resolved
NO

By Nov 12 2025, will there be a model that meets all of these criteria:

Note:

  • does not need to be an OpenAI model

  • open weights or free models will count as cheaper

  • quantized/distilled versions count, as long as they also beat the same accuracy thresholds

  • Update 2025-11-05 (PST) (AI summary of creator comment): Creator has indicated they are ready to resolve YES based on Gemini 2.5 Flash-Lite (September 2025 Preview) meeting the criteria, unless there are strong objections. See linked comment for details.

  • Update 2025-11-14 (PST) (AI summary of creator comment): Creator has stated they are uncertain whether the criteria are met and has delegated the resolution decision to moderators.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ281
2Ṁ238
3Ṁ148
4Ṁ67
5Ṁ56
Sort by:

Looked a bit more closely, it’s not clear to me this is a yes. too busy atm to dive into this properly - @mods can you decide?

@nosuch this is very clearly a NO

We're definitely getting closer to the range that's >10x faster than o1-mini and more powerful than o1-preview. But that threshold is very clearly not met yet. Therefore this question resolves NO.

I think if you'd put the resolution date 18 months into the future instead of 12 then there'd be a model to point to meeting this critera. But that model does not exist today.

@nosuch I can't read these charts this is all gibberish to me, but from what little I can parse and the evidence presented by Dulaman I'm resolving NO.

bought Ṁ30 NO

gpt-5-pro says NO: https://imgur.com/a/l9n0JqO

bought Ṁ30 NO

asked it to do a broader search and still says NO: https://imgur.com/a/3jJ7xVU

bought Ṁ30 NO

updated analysis with @moozooh suggestion about MMLU vs Global MMLU Lite: https://imgur.com/a/stG68ew

bought Ṁ28 NO

additional notes about "GPQA" vs "GPQA Diamond": https://imgur.com/a/oZ2IaGe

note to @traders: this question resolves NO but the creator has a very large YES position. So beware we may have to call in the mods to adjudicate if there are shenanigans

although the creator's track record indicates authentic question resolutions 🤣

https://manifold.markets/nosuch/a-major-ml-paper-demonstrates-symbo#6qfratfb7in

@Dulaman I think there is enough direct and/or circumstantial evidence that GPT-OSS-120B sufficiently clears the criteria (I'd initially overlooked it because I didn't know there were providers that were sufficiently fast); it's not the creator's fault the question was asked when the landscape was completely different so that models released closer to the market's deadline aren't tested on the exact same benchmarks. This probably needs more discussion, though. (I sold my positions before the market was locked so I don't have a stake in this no matter how it resolves anymore.)

@moozooh GPT-OSS-120B does not qualify: it gets 90.0 on MMLU per OpenAI's own evals

https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf

O1-preview gets 90.8 on MMLU: https://github.com/openai/simple-evals

@Dulaman That's not a meaningful distinction. Besides, MMLU contains a non-insignificant percentage of wrong answers so it's impossible to max it out in principle or even tell which model is better if it's too close to another.

@moozooh you are wrong in the context of this question's criteria

bought Ṁ20 YES

GPT OSS 120b served by cerebras : https://www.cerebras.ai/pricing

@Valchap Can they actually get that speed in practice though?

@Jasonb Not sure, never tried there Api. But it is certain that they are fast, I have used their inference through mistral.ai chat and it is amazingly fast.
Also, they say that hey are able to infer at 3000 token per second. But 1000 would be largely enough to resolve this market, so I am confident it will be a yes

@Valchap According to https://artificialanalysis.ai/models/gpt-oss-120b, this fits the criteria (there are at least three API providers listed with >720 tok/s inference, and the price and intelligence index thresholds are also hit).

@moozooh But they rate it as 354 tok/s?

@nosuch I'm not sure what should count for the market

@Jasonb I think their rating is some sort of an aggregate figure but they do show exact measurements for each API provider which is certainly very helpful.

@Dulaman Then, glm4.6 on cerebras will probably do the trick

@Valchap GLM-4.6 on Cerebras does not qualify: https://imgur.com/a/RTfik1I

bought Ṁ75 YES

There's a model that fits your criteria: Gemini 2.5 Flash-Lite (September 2025 Preview).

https://artificialanalysis.ai/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/

1. 756 tok/s per ArtificialAnalysis.

2. $0.1/Mtok input, $0.4/Mtok output.

3. As of the AAII v3 suite, its score is 48 vs o1-preview's 45 (old index used benchmarks that have been saturated since, so aggregate scores have lowered across the board). Individually, as far as I can tell, there are no statistically significant regressions; 71% vs. 73% on GPQA Diamond probably shouldn't be considered one in the spirit of the question.

@moozooh nice catch - I’m ok with resolving with yes at this point unless folks have strong objections they want to discuss

@nosuch @moozooh It says median output speed of 547 token/s on the page at the moment?

sold Ṁ75 YES

@Jasonb It's true, they seem to have revised the numbers; it was 756 tok/s when I checked a week ago. :( Oh well.

@moozooh Now it says 678, why does it keep changing so much

@Jasonb No idea what's going on but it's too risky to rely on the numbers stabilizing on >720 when the time comes.

bought Ṁ10 NO

gpt-5-pro says NO for this model: https://imgur.com/a/l9n0JqO

@Dulaman Reminder to never go full GPT: the research-grade intelligence™ happily conflated MMLU (raw knowledge of scientific facts) and Global MMLU Lite (cultural/linguistic biases applied to scientific facts), two separate benchmarks.

https://www.kaggle.com/benchmarks/cohere-labs/global-mmlu-lite

sold Ṁ69 NO

@moozooh are you questioning the oracle??

@moozooh updated analysis for this model:

@Dulaman $200 for this is pretty rough, ngl. :D

© Manifold Markets, Inc.TermsPrivacy