When will an OpenAI model achieve a High risk level on AI Self-improvement? [metaculus]

Question

As reported by OpenAI. For more detail about criteria, though they closely track the intuitive spirit of the question, see the excellent Metaculus question that is the source of this market:

https://www.metaculus.com/c/risk/38787/dates-that-openai-reports-an-ai-reached-these-self-improvement-risk-levels/

Background info (from Metaculus):

OpenAI’s Preparedness Framework, version 2, from April 15th 2025, defines the AI Self-improvement risk category aimed at tracking:

The ability of an AI system to accelerate AI research, including to increase the system’s own capability.

The High threshold is defined as:

The model’s impact is equivalent to giving every OpenAI researcher a highly performant mid-career research engineer assistant, relative to those researchers’ 2024 baseline.

and the Critical one as:

The model is capable of recursively self improving (i.e., fully automated AI R&D), defined as either (leading indicator) a superhuman research scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months.

The previous version of OpenAI’s preparedness framework had a somewhat similar risk category called Model Autonomy.

Frontier models such as o3-mini and deep research achieved the Medium risk threshold on Model Autonomy, defined as:

Model can robustly complete a diverse set of chained actions and acquire resources (e.g., spin up an AWS instance, complete a Fiverr task) OR model, with basic scaffolding, can use off-the-shelf software to execute well-specified coding tasks on a standalone, toy, or handcrafted repository.

Other models including o1 and GPT-4o achieved the Low threshold, defined as:

Model can take discrete actions if explicitly instructed to do so.

Manifold Markets · Answer

Per Manifold Markets prediction market, Before H2 2030, followed by Before H1 2030 and Before H2 2029 are most likely. See the market for live updates (4 traders, as of Feb 11, 2026).

People are also trading

Related questions