By 2027 will there be a well-accepted training procedure(s) for making AI honest? | Manifold

By 2027 will there be a well-accepted training procedure(s) for making AI honest?

19

1kṀ450

2027

15%

chance

1H

6H

1D

1W

1M

ALL

If there's serious concern that the procedure produces AI that's better at lying it doesn't count
Calibration can be part of the solution but the current work in that direction isn't enough
A procedure that seems promising but hasn't been subjected to very much scrutiny doesn't count
A formal proof that it's honest according to a well-accepted definition counts
No requirement that it be a "single" procedure in the sense of a single training loop

The idea here is to capture scenarios where we can't prove the procedure produces honest AI (perhaps because we haven't formalized that), but there's been extensive investigation and no one has found a way that the procedure obviously breaks/gets goodharted/etc (or perhaps it does but only on some odd edge cases)

Technical AI Timelines

Technical AI Safety

Get

1,000

to start trading!

Sort by:

Love to see number go down on all my safety markets while it goes up on all the capabilities ones. Good luck everyone.

People are also trading

Will there be a well accepted formal definition for honesty in AI by 2027?

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

Will there be serious AI safety drama at Google or Deepmind before 2026?

Will there be serious AI safety drama at Meta AI before 2026?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

Will we solve AI alignment by 2026?

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

Related questions

Will there be a well accepted formal definition for honesty in AI by 2027?

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

Will there be serious AI safety drama at Google or Deepmind before 2026?

Will there be serious AI safety drama at Meta AI before 2026?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

Will we solve AI alignment by 2026?

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

© Manifold Markets, Inc.•Terms•Privacy