DeepSeek V4 Pro METR 50% time horizon

MANIFOLD

Ṁ1.5kṀ14k

Jul 31

<1.5h

1.5h - 2h

2h - 2.5h

2.5h - 3h

3h - 3.5h

3.5h - 4h

20%

4h - 5h

15%

5h - 6h

11%

6h - 7h

7h - 8h

8h - 9h

9h - 10h

10h - 11h

1.9%

11h - 12h

1.9%

>=12h

This market will resolve to the highest 50% time horizon, as reported by METR, for the first DeepSeek V4 Pro model to appear on METR's graph. V4 Pro variants (Preview, GA) count for the purpose of this market. V3 / R1 or earlier would not count. Later DeepSeek models would not count.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's Time Horizon 1.1 update for the technical definition. As of May 2026, frontier time horizons are around 12 hours, with a doubling time of roughly 4 months.

Left bounds inclusive, right bounds exclusive.

People are also trading

Best AI time horizon by August 2026, per METR?

Kimi K3 Thinking METR 50% time horizon

Grok 5 METR 50% time horizon

Claude Sonnet 4.6 METR 50% time horizon

Grok 4.20 METR 50% time horizon

Best METR 80% Time Horizon before August 2026

Claude Opus 4.7 METR 50% time horizon

Claude Opus 4.8 METR 50% time horizon

Claude Sonnet 5 METR 50% time horizon

GPT 5.5 METR 50% time horizon

Sort by:

The members of the AI futures project have given an update and they appear to now be relying on the 80% time horizon length graph from METR for their predictions rather than the 50% time horizon length graph. This implies that a 50% time horizon is not enough. While I think markets for 50% time horizons are useful, I now think that more attention needs to be paid to 80% time horizon lengths. I am planning to create markets for 80% time horizons as soon as possible unless someone beats me to it.

@MaxLennartson Source: https://www.aifuturesmodel.com/#section-timehorizonandtheautomatedcodermilestone

I just don't think they'll release R2 anymore and will just release V4 with both a thinking and nonthinking version like most labs are doing these days

If that happens, @traders do you agree it's fair to make it about V4 instead? ie if V4 is a reasoning model, R2 would refer to V4-thinking for the purpose of this market?

DeepSeek-R1: 27 mins, released 01-20-25 (SOTA since December was 39 mins)

DeepSeek-R1-0528: 31 mins, released 4 months later (SOTA since April was 1.5 hours)

quadrupling from 31 mins to > 2 hours in another 4 months seems (very) unlikely, not betting more because of uncertainty over when (if ever) it’ll be released.

People are also trading

Best AI time horizon by August 2026, per METR?

Kimi K3 Thinking METR 50% time horizon

Grok 5 METR 50% time horizon

Claude Sonnet 4.6 METR 50% time horizon

Grok 4.20 METR 50% time horizon

Best METR 80% Time Horizon before August 2026

Claude Opus 4.7 METR 50% time horizon

Claude Opus 4.8 METR 50% time horizon

Claude Sonnet 5 METR 50% time horizon

GPT 5.5 METR 50% time horizon

People are also trading

People are also trading

Related questions