R2's 50% time horizon, per METR
16
1.5kṀ5512
Dec 31
28%
<1.5h
16%
1.5h - 2h
18%
2h - 2.5h
12%
2.5h - 3h
8%
3h - 3.5h
6%
3.5h - 4h
4%
4h - 5h
2%
5h - 6h
1.2%
6h - 7h
1.2%
7h - 8h
0.9%
8h - 9h
0.9%
9h - 10h
0.7%
10h - 11h
0.7%
11h - 12h
0.7%
>=12h

This market will resolve to the highest 50% time horizon, as reported by METR, for any R2 model released within a month of the first R2 announcement.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

Time horizon could vary based on the set of tasks used to measure it, so this market will be based on the time horizon for the most comprehensive set of tasks reported by METR (as of 2025, largely software and engineering tasks). This will be ambiguous if METR stops publishing time horizons across all of their autonomy tasks and only publishes separate results for different subsets; I might N/A in that scenario.

See also:
/Bayesian/gemini-3s-50-time-horizon-per-metr

/Bayesian/gpt5s-50-time-horizon-per-metr

/Bayesian/grok-5s-50-time-horizon-per-metr

/Bayesian/r2s-50-time-horizon-per-metr (this market)

Get
Ṁ1,000
to start trading!
Sort by:

I just don't think they'll release R2 anymore and will just release V4 with both a thinking and nonthinking version like most labs are doing these days

If that happens, @traders do you agree it's fair to make it about V4 instead? ie if V4 is a reasoning model, R2 would refer to V4-thinking for the purpose of this market?

DeepSeek-R1: 27 mins, released 01-20-25 (SOTA since December was 39 mins)

DeepSeek-R1-0528: 31 mins, released 4 months later (SOTA since April was 1.5 hours)

quadrupling from 31 mins to > 2 hours in another 4 months seems (very) unlikely, not betting more because of uncertainty over when (if ever) it’ll be released.

© Manifold Markets, Inc.TermsPrivacy