Will GPT-5.1 have a longer METR time horizon than Gemini 3?
21
100Ṁ1627
2026
48%
chance
4

This market will resolve to yes if the METR time-horizon of GPT-5.1 is longer than Gemini 3s. If there is never a model released called "GPT-5.1", this market will resolve to No. If there is an exact tie, this market will resolve to No.

This will be based on the 50% time horizon, not the 80% time horizon.

If METR doesn't release 50% time horizons for either model by the end of 2026, despite both models existing, I will resolve the market to N/A if it appears like they are not going to measure the time horizon of one or both of the models. I will extend the close date if it seems like they are going to eventually measure the time horizon.

  • Update 2025-11-19 (PST) (AI summary of creator comment): GPT-5.1 baseline: The creator will use GPT 5.1 Codex Max's time horizon of 2 hours 42 minutes for resolution purposes, unless a time horizon for GPT 5.1 Thinking is also released (in which case the longer of the two will be used).

Gemini 3 comparison: The creator will compare against any Gemini 3 series model (not just Gemini 3.0 Pro). If Gemini 3 Pro's time horizon is shorter than 2:42, the creator will wait for potential METR results from other Gemini 3 models (like Gemini 3 DeepThink) before resolving.

Resolution timing: The market will not resolve immediately when Gemini 3 Pro's time horizon is released if it's shorter than 2:42. A resolution date will be set to allow time for additional Gemini 3 model measurements.

Get
Ṁ1,000
to start trading!
Sort by:

I think that I would count this as GPT 5.1's task horizon for the purposes of resolution unless that is super objectionable to others. I have doubts we'll get another METR time horizon measurement for the GPT 5.1 generation of models and it seems within the spirit of the resolution criteria. If there's serious objections to that I'll resolve to N/A.

If they do a time horizon for GPT 5.1 thinking as well, I'll use whichever is longer. The spirit of the resolution criteria is to compare the 5.1 series of models to the Gemini 3 series of models. That means that I probably will not resolve this right away when the METR task horizon for Gemini 3 Pro comes out (unless it's longer than 2 hours 42 minutes), because I'd want to see if get a result for Gemini 3 DeepThink or similar, and see if that could exceeed that of GPT 5.1 Codex Max.

When Gemini 3 pro's metr task horizon comes out, if it's time horizon is shorter than 2 hours and 42 minutes, I will set a resolution date until which I will wait for more METR time horizon results from Gemini 3 models. If there are no such results by that date, I will resolve to yes.

no

© Manifold Markets, Inc.TermsPrivacy