Presumably GPT 5.6 or maybe GPT 6 depending how they brand it. 5.6 seems to be the rumour right now.
Resolves based on my personal eyeballing of mostly benchmark results / vibes.
Resolve YES if it's distinctly Fable-class. Resolves NO if it's distinctly Opus-class.
Maybe resolves to 50% if it seems genuinely in the middle of the two.
N/A if there is no new release by close.
People are also trading
@traders I expect I will wait until the full benchmark suite for 5.6 Sol is released. But at the moment this is looking very likely to resolve yes.
@eapache How do you conclude that? When I look at METR, GPT-5.6 Sol sits at Opus level with 11.3 hours @50%, substantially below Mythos Preview.
(That's with the default methodology of counting reward hacking attempts as failures. METR notes that GPT-5.6 Sol's "cheating rate was higher than any public model we have evaluated". If other benchmarks show better values in the headline numbers, I'd wonder how much of that is cheating.)
@David6LScg The (very limited) benchmarks published on https://openai.com/index/previewing-gpt-5-6-sol/ sit around Fable/Mythos level rather than Opus level. I hadn’t seen the METR results, but they do add some uncertainty.
@eapache See the METR report here: https://metr.org/blog/2026-06-26-gpt-5-6-sol/
50% success time horizons:
GPT-5.6: 11.3 hours
Opus 4.6 (Feb 2026): 12.0 hours
Mythos Preview (Feb 2026): 17.4 hours
Note that Opus 4.8 and Mythos 5/Fable 5 have not been measured yet. The above are older Claude versions from February, and there even Opus is scoring above GPT 5.6 Sol, with Mythos Preview in the range >16 hours which METR notes as saturated for their testing suit.
If OpenAI has not controlled benchmark scores for GPT-5.6's egregious reward hacking, they might not represent capability as much as ability + propensity to cheat.
