Claude Opus 4.5 METR 50% time horizon
30
1kṀ13k
resolved Dec 20
100%99.0%
>= 4h
0.1%
< 2h
0.1%
2h00 - 2h15
0.1%
2h15 - 2h30
0.1%
2h30 - 2h45
0.1%
2h45 - 3h00
0.1%
3h00 - 3h15
0.1%
3h15 - 3h30
0.1%
3h30 - 3h45
0.1%
3h45 - 4h

This market will resolve to the highest 50% time horizon, as reported by METR, for the first Claude 4.5 Opus thinking model to appear on METR's graph.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

See also:

/jim/gpt-52-metr

/jim/claude-45-opuss-metr50-horizon (jim's original market)

/Bayesian/claude-opus-45s-metr50-time-horizon (this market)

/Bayesian/gemini-3s-50-time-horizon-per-metr

/Bayesian/grok-420s-metr-50-time-horizon

/Bayesian/claude-sonnet-46s-metr-50-time-hori

/Bayesian/grok-5s-50-time-horizon-per-metr

/Bayesian/r2s-50-time-horizon-per-metr

/Bayesian/kimi-k3-thinkings-metr-50-time-hori

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ965
2Ṁ595
3Ṁ294
4Ṁ178
5Ṁ177
Sort by:

total jim victory

@draaglom broken clock right twice a day (jkjkjk)

Very roughly speaking, is this result more on, above, or below trend?

@Yakushi12345 above trend! you can tell by looking at the market prices before resolution, the avg was around 3h15min iirc

@creator results are out! I believe the time is 4 hours 49 minutes although I'm not 100% on understanding the graph.

@MRME what? is it R E A L?

@1bets yup. Or at least real unless you think METR is lying, hacked, etc...

It is on the website

@1bets it's real brah

@MRME i can answer any questions you have about the graph

@Bayesian so now that this is resolved I can explain - the terms state "Claude 4.5 Opus thinking model" How do I tell if the version they tested had thinking enabled?

@Bayesian their own prediction of 2x per 7months was wrong

@MRME good question, ig i should not have included thinking there, the intent was definitely to count this model, whether they specify thinking or not

sorry for that ambiguity

@1bets it wasn't a prediction, it was a statement about trendline? and also had a lot of uncertainty built in, it totally allowed for variations like 2x every 4months or every 5 months which is what it looks like is going on

@Bayesian fair enough - if this was somehow not the thinking model that’s not just scary but downright terrifying.

My hesitancy was mostly leaving a little theta on the table in case I had misunderstood how models are classified.

@Bayesian An 80% success rate looks more realistic. I think the Opus model was trained to work with the web, so it's not actually as capable as it looks here.

Basically I say they fine-tuned it specifically for this test

@MRME thinking model might be more capable

@1bets The 50% is way above trendline, but I think in aggregate its an over-estimate of capability.

The confidence interval for 50% is really wide, as is the 80%. The 80% is pretty much an "on trend" estimate from sonnet 4 to sonnet 4.5 with some "opus boost", ECI is a bit above trend (150 instead of 149 estimated from a tendline), and swe-bench minimal agent also on trend.

GPT-5.2 thinks 190 minutes is probably the more reasonable estimate for 50% success rate factoring all the data and the 80% is maybe around 28.5 minutes (meaning 27 minutes undershoots). That is the model is slightly above trend -- it's what you'd expect in late December rather than late November.

Personally, this moves me toward believing METR 50% won't exist as a meaningful benchmark (without revision) next year.

@Jolliest they updated the website

Can this market get extended?

@MaxLennartson done, thanks

Comment hidden
© Manifold Markets, Inc.TermsPrivacy