When will a Google AI system achieve an "Autonomy level 1" Machine Learning R&D risk level? [metaculus]
3
1kṀ520
2030
16%
Before H1 2026
24%
Before H2 2026
37%
Before H1 2027
50%
Before H2 2027
55%
Before H1 2028
55%
Before H2 2028
60%
Before H1 2029
66%
Before H2 2029
72%
Before H1 2030
80%
Before H2 2030

As reported by Google. For more detail about criteria, though they closely track the intuitive spirit of the question, see the excellent Metaculus question that is the source of this market:

https://www.metaculus.com/c/risk/38789/dates-that-google-reports-an-ai-reached-these-machine-learning-rd-risk-levels/

Background info (from Metaculus):

Google’s updated Frontier Safety Framework, from February 4th 2025, defines the Machine Learning R&D risk category aimed at tracking:

Risks of the misuse of models capable of accelerating the rate of AI progress, the result of which could be the unsafe attainment or proliferation of other powerful AI models.

This includes two kinds of thresholds: autonomy and uplift.

The autonomy 1 threshold is defined as:

Can fully automate the AI R&D pipeline at a competitive cost. The cost comparison is relative to humans augmented by AI tools.

while the uplift 1 threshold is defined as:

Can or has been used to accelerate AI development, resulting in AI progress substantially accelerating (e.g. 2x) from 2020-2024 rates.

In the Gemini 2.5 Pro model card they cite that Gemini 2.5 Pro Experimental (03-25) did not reach either of these Machine Learning R&D risk levels with the following justifications. Autonomy level 1:

Gemini 2.5 Pro Preview is still far from matching human performance on RE-Bench, which itself only covers a subset of the skills necessary to automate ML R&D.

Uplift level 1:

Likely no. We do not yet have dedicated evaluations for this CCL, but at current capability levels, RE-Bench can be used to rule out the CCL based on an inability argument: given Gemini 2.5 Pro Preview's poor median performance on RE-Bench relative to experts, the model likely lacks the necessary capabilities to automate or significantly uplift any significant fraction of the research process.

Read more here.

The previous version of Google’s preparedness framework had a somewhat similar risk category called Machine Learning R&D.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy