This market will resolve to the highest 50% time horizon, as reported by METR, for the first Claude 4.5 Opus thinking model to appear on METR's graph.
50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.
See also:
/jim/claude-45-opuss-metr50-horizon (jim's original market)
/Bayesian/claude-opus-45s-metr50-time-horizon (this market)
/Bayesian/gemini-3s-50-time-horizon-per-metr
/Bayesian/grok-420s-metr-50-time-horizon
/Bayesian/claude-sonnet-46s-metr-50-time-hori
/Bayesian/grok-5s-50-time-horizon-per-metr
🏅 Top traders
| # | Name | Total profit |
|---|---|---|
| 1 | Ṁ965 | |
| 2 | Ṁ595 | |
| 3 | Ṁ294 | |
| 4 | Ṁ178 | |
| 5 | Ṁ177 |
People are also trading
@Yakushi12345 above trend! you can tell by looking at the market prices before resolution, the avg was around 3h15min iirc
@creator results are out! I believe the time is 4 hours 49 minutes although I'm not 100% on understanding the graph.
@Bayesian so now that this is resolved I can explain - the terms state "Claude 4.5 Opus thinking model" How do I tell if the version they tested had thinking enabled?
@MRME good question, ig i should not have included thinking there, the intent was definitely to count this model, whether they specify thinking or not
@1bets it wasn't a prediction, it was a statement about trendline? and also had a lot of uncertainty built in, it totally allowed for variations like 2x every 4months or every 5 months which is what it looks like is going on
@Bayesian fair enough - if this was somehow not the thinking model that’s not just scary but downright terrifying.
My hesitancy was mostly leaving a little theta on the table in case I had misunderstood how models are classified.
@Bayesian An 80% success rate looks more realistic. I think the Opus model was trained to work with the web, so it's not actually as capable as it looks here.
Basically I say they fine-tuned it specifically for this test
@1bets The 50% is way above trendline, but I think in aggregate its an over-estimate of capability.
The confidence interval for 50% is really wide, as is the 80%. The 80% is pretty much an "on trend" estimate from sonnet 4 to sonnet 4.5 with some "opus boost", ECI is a bit above trend (150 instead of 149 estimated from a tendline), and swe-bench minimal agent also on trend.
GPT-5.2 thinks 190 minutes is probably the more reasonable estimate for 50% success rate factoring all the data and the 80% is maybe around 28.5 minutes (meaning 27 minutes undershoots). That is the model is slightly above trend -- it's what you'd expect in late December rather than late November.
Personally, this moves me toward believing METR 50% won't exist as a meaningful benchmark (without revision) next year.