MANIFOLD
Claude Sonnet 4.5 outperforms GPT-5 on METR 50% time horizon?
65
Ṁ1kṀ19k
resolved Oct 9
Resolved
NO

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ2,029
2Ṁ745
3Ṁ687
4Ṁ242
5Ṁ197
Sort by:

bought Ṁ35 YES

In log space the gap between GPT-5 (137 minutes) and Opus 4.1 (105 minutes) is almost exactly the gap between Opus 4.1 and Opus 4 (80 minutes), a 30-31% jump in both cases.

Good argument but seems to me like 4.5 is pretty even with Opus 4.1. If we were forecasting Opus 4.2's METR 50% time horizon it would be one thing, but this model is much smaller (than Opus) and probably not RLed as heavily as GPT-5 has been. It seems like people are very much not universally convinced that s4.5 is as good as codex at coding tasks, and on top of that it has less reasoning-buff and openai-doing-better-at-this-task-historically buff. sonnet 4 to 4.5 requires a 100% jump for this market to resolve yes, and 4.5 is built from sonnet 4, not from opus 4.1, so that seems like the more relevant frame to me. we will see

sold Ṁ39 YES

@Bayesian Sonnet 4 does somewhat worse than you'd guess based on its general performance on SWE, e.g. most people think Sonnet 4 vs GPT-5 is a close call in real-world SWE. The question is whether this "residual underperformance" carries over to Sonnet 4.5. But Sonnets (3.5 to 3.7) have historically led in METR time horizon.

Ping me if you want to bet larger yes volume on this

bought Ṁ50 NO

cowards /s

© Manifold Markets, Inc.TermsPrivacy