Will GPT-5.1 have a longer METR time horizon than Gemini 3?

100Ṁ1627

2026

48%

chance

ALL

This market will resolve to yes if the METR time-horizon of GPT-5.1 is longer than Gemini 3s. If there is never a model released called "GPT-5.1", this market will resolve to No. If there is an exact tie, this market will resolve to No.

This will be based on the 50% time horizon, not the 80% time horizon.

If METR doesn't release 50% time horizons for either model by the end of 2026, despite both models existing, I will resolve the market to N/A if it appears like they are not going to measure the time horizon of one or both of the models. I will extend the close date if it seems like they are going to eventually measure the time horizon.

Update 2025-11-19 (PST) (AI summary of creator comment): GPT-5.1 baseline: The creator will use GPT 5.1 Codex Max's time horizon of 2 hours 42 minutes for resolution purposes, unless a time horizon for GPT 5.1 Thinking is also released (in which case the longer of the two will be used).

Gemini 3 comparison: The creator will compare against any Gemini 3 series model (not just Gemini 3.0 Pro). If Gemini 3 Pro's time horizon is shorter than 2:42, the creator will wait for potential METR results from other Gemini 3 models (like Gemini 3 DeepThink) before resolving.

Resolution timing: The market will not resolve immediately when Gemini 3 Pro's time horizon is released if it's shorter than 2:42. A resolution date will be set to allow time for additional Gemini 3 model measurements.

Get

1,000

to start trading!

3 Comments

18 Holders

43 Trades

Sort by:

GPT 5.1 Codex Max got 2 hours 42 minutes https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

I think that I would count this as GPT 5.1's task horizon for the purposes of resolution unless that is super objectionable to others. I have doubts we'll get another METR time horizon measurement for the GPT 5.1 generation of models and it seems within the spirit of the resolution criteria. If there's serious objections to that I'll resolve to N/A.

If they do a time horizon for GPT 5.1 thinking as well, I'll use whichever is longer. The spirit of the resolution criteria is to compare the 5.1 series of models to the Gemini 3 series of models. That means that I probably will not resolve this right away when the METR task horizon for Gemini 3 Pro comes out (unless it's longer than 2 hours 42 minutes), because I'd want to see if get a result for Gemini 3 DeepThink or similar, and see if that could exceeed that of GPT 5.1 Codex Max.

When Gemini 3 pro's metr task horizon comes out, if it's time horizon is shorter than 2 hours and 42 minutes, I will set a resolution date until which I will wait for more METR time horizon results from Gemini 3 models. If there are no such results by that date, I will resolve to yes.

People are also trading

Gemini 3's 50% time horizon, per METR

GPT-5 Pro's 50% time horizon, per METR

Gemini 3.0 Pro outperforms GPT-5 on METR 50% time horizon?

78% chance

Before 2026, will Gemini 3.0 exceed GPT-5 in Metr estimated time horizon?

76% chance

People are also trading

Related questions