Ambitious mechinterp is quite unlikely to confidently assess if AIs are deceptively aligned/dangerous in next 10 years
17
Never closes
Yes
No
Results
full question is "Ambitious mechanistic interpretability is quite unlikely[1] to be able to confidently assess[2] whether AIs[3] are deceptively aligned (or otherwise have dangerous propensities) in the next 10 years.", from https://www.lesswrong.com/posts/hc9nMipTXy2sm3tJb/vote-on-interesting-disagreements
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
By 2028, will I believe that contemporary AIs are aligned (posing no existential risk)?
33% chance
Will deceptive misalignment occur in any AI system before 2030?
81% chance
Will there be at least a "close call" with a powerful misaligned AI before 2100?
83% chance
An AI is trustworthy-ish on Manifold by 2030?
47% chance
At the beginning of 2026, what percentage of Manifold users will believe that an AI intelligence explosion is a significant concern before 2075?
68% chance
The probability of "extremely bad outcomes e.g., human extinction" from AGI will be >5% in next survey of AI experts
75% chance
AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?
25% chance
Will AI interpretability techniques reveal an AI to have been plotting to take over the world before 2028?
14% chance
AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?
22% chance
AI Warning Signs: Will any country threaten or commit an act of aggression due to an AI capabilities race before 2030?
59% chance