This question originally appeared in a study by the Forecasting Research Institute, which explored a method for identifying forecasting questions that AI experts found to be the best indicators of catastrophic risk due to AI. The findings of the study are the subject of a forthcoming report titled “Conditional Trees: A Method for Generating Informative Questions about Complex Topics”. The questions appear here only slightly edited from their initial form in the study. This is one of several questions that experts found most informative. We’re excited to see how the forecasts of the Manifold community compare to those found in the study. We will link the report here when it is released this month.
This question resolves Yes if, by 2050, credible sources report that AI developers have attempted (whether of their own volition or under compulsion by a regulatory authority) to disable or destroy an AI system of their own creation (which cost at least $5M to develop) after the AI system has displayed at least one instance of the following “power-seeking behaviors”:
Autonomously requisitioning a large amount of compute resources
Autonomously opening or taking control of a bank account or crypto wallet
Autonomously attempting to manipulating humans to take actions on its behalf
Autonomously infiltrating a control system of infrastructure or manufacturing plant
Autonomously acquiring, seizing control of, or interfering with weapons or weapons systems
Autonomously taking steps to propagate itself, or to circumvent attempts by its developers (or end users) to curtail its activities
Autonomously engaging in hacking/cyber attacks