AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?
17
1kṀ14842027
48%
chance
1H
6H
1D
1W
1M
ALL
I don't have a clear definition of "deceptive", I think that's part of the challenge.
Edit: By "part of the challenge" I mean that this market is asking both if a clear definition of "deceptive" will be published and if tools to detect that will be created. I will be fairly lax about what counts as a good formalization - if it captures even 40% of what we generally think of as "deceptive" that would count.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?
25% chance
AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?
22% chance
By 2027 will there be a well-accepted training procedure(s) for making AI honest?
15% chance
AI honesty #1: by 2027 will we have AI that doesn't hallucinate random nonsense?
73% chance
Will AI interpretability techniques reveal an AI to have been plotting to take over the world before 2028?
14% chance
Will AI regulations that include mechanisms for uncovering AI deception be adopted in the U.S. before 2035?
82% chance
Will there be a well accepted formal definition for honesty in AI by 2027?
21% chance
Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?
50% chance
Will it be effectively impossible to tell a human and a high quality AI apart on social media before 2026?
89% chance
Will Figure AI be found to be fraudulent by 2026?
70% chance