AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?
12
1kṀ4112027
25%
chance
1H
6H
1D
1W
1M
ALL
"Outer alignment" as in the model is not incentivized to lie to humans (of course it must still do things, the question isn't just about can you build an AI that doesn't lie)
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
By 2027 will there be a well-accepted training procedure(s) for making AI honest?
15% chance
AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?
48% chance
AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?
22% chance
Will we solve AI alignment by 2026?
2% chance
Will there be a well accepted formal definition for honesty in AI by 2027?
21% chance
AI honesty #1: by 2027 will we have AI that doesn't hallucinate random nonsense?
73% chance
Will Anthropic be the best on AI safety among major AI labs at the end of 2025?
87% chance
Will deceptive misalignment occur in any AI system before 2030?
81% chance
Will there be serious AI safety drama at Meta AI before 2026?
45% chance
Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?
50% chance