Claude Opus 4.5 METR 50% time horizon

Ṁ1kṀ13k

resolved Dec 20

100%99.0%

>= 4h

0.1%

< 2h

0.1%

2h00 - 2h15

0.1%

2h15 - 2h30

0.1%

2h30 - 2h45

0.1%

2h45 - 3h00

0.1%

3h00 - 3h15

0.1%

3h15 - 3h30

0.1%

3h30 - 3h45

0.1%

3h45 - 4h

This market will resolve to the highest 50% time horizon, as reported by METR, for the first Claude 4.5 Opus thinking model to appear on METR's graph.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

🏅 Top traders

#	Trader	Total profit
1		Ṁ965
2		Ṁ595
3		Ṁ294
4		Ṁ178
5		Ṁ177

People are also trading

Claude Opus 4.8 METR 50% time horizon

Claude Opus 4.7 METR 50% time horizon

Claude Sonnet 4.6 METR 50% time horizon

Claude Opus 5 METR 50% time horizon [old version, bad buckets]

Claude Opus 4.8 METR 50% time horizon

Claude Opus 5 METR 50% time horizon

Kimi K3 Thinking METR 50% time horizon

Claude Sonnet 5 METR 50% time horizon

Grok 4.20 METR 50% time horizon

Grok 5 METR 50% time horizon

Sort by:

total jim victory

@draaglom broken clock right twice a day (jkjkjk)

Very roughly speaking, is this result more on, above, or below trend?

@Yakushi12345 above trend! you can tell by looking at the market prices before resolution, the avg was around 3h15min iirc

@creator results are out! I believe the time is 4 hours 49 minutes although I'm not 100% on understanding the graph.

@MRME what? is it R E A L?

@1bets yup. Or at least real unless you think METR is lying, hacked, etc...

It is on the website

@1bets it's real brah

@MRME i can answer any questions you have about the graph

@Bayesian so now that this is resolved I can explain - the terms state "Claude 4.5 Opus thinking model" How do I tell if the version they tested had thinking enabled?

@Bayesian their own prediction of 2x per 7months was wrong

@MRME good question, ig i should not have included thinking there, the intent was definitely to count this model, whether they specify thinking or not

sorry for that ambiguity

@1bets it wasn't a prediction, it was a statement about trendline? and also had a lot of uncertainty built in, it totally allowed for variations like 2x every 4months or every 5 months which is what it looks like is going on

@Bayesian fair enough - if this was somehow not the thinking model that’s not just scary but downright terrifying.

My hesitancy was mostly leaving a little theta on the table in case I had misunderstood how models are classified.

@Bayesian An 80% success rate looks more realistic. I think the Opus model was trained to work with the web, so it's not actually as capable as it looks here.

Basically I say they fine-tuned it specifically for this test

@MRME thinking model might be more capable

@1bets The 50% is way above trendline, but I think in aggregate its an over-estimate of capability.

The confidence interval for 50% is really wide, as is the 80%. The 80% is pretty much an "on trend" estimate from sonnet 4 to sonnet 4.5 with some "opus boost", ECI is a bit above trend (150 instead of 149 estimated from a tendline), and swe-bench minimal agent also on trend.

GPT-5.2 thinks 190 minutes is probably the more reasonable estimate for 50% success rate factoring all the data and the 80% is maybe around 28.5 minutes (meaning 27 minutes undershoots). That is the model is slightly above trend -- it's what you'd expect in late December rather than late November.

Personally, this moves me toward believing METR 50% won't exist as a meaningful benchmark (without revision) next year.