By 2027 will there be a well-accepted training procedure(s) for making AI honest? | Manifold

By 2027 will there be a well-accepted training procedure(s) for making AI honest?

19

Ṁ1kṀ450

2027

15%

chance

1H

6H

1D

1W

1M

ALL

If there's serious concern that the procedure produces AI that's better at lying it doesn't count
Calibration can be part of the solution but the current work in that direction isn't enough
A procedure that seems promising but hasn't been subjected to very much scrutiny doesn't count
A formal proof that it's honest according to a well-accepted definition counts
No requirement that it be a "single" procedure in the sense of a single training loop

The idea here is to capture scenarios where we can't prove the procedure produces honest AI (perhaps because we haven't formalized that), but there's been extensive investigation and no one has found a way that the procedure obviously breaks/gets goodharted/etc (or perhaps it does but only on some odd edge cases)

Market context

Technical AI Timelines

Technical AI Safety

Get

1,000

to start trading!

Sort by:

Love to see number go down on all my safety markets while it goes up on all the capabilities ones. Good luck everyone.

People are also trading

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

Will there be a well accepted formal definition for honesty in AI by 2027?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #1: by 2027 will we have AI that doesn't hallucinate random nonsense?

Will a company lose ownership of an AI due to credible claims of the AI's possible sentience by the end of 2026?

Will AI regulations that include mechanisms for uncovering AI deception be adopted in the U.S. before 2035?

Will AI safety and regulation be mandatory training courses for students working with AI by the year 2035 under Federal Law?

Before 2028, will there be a major self-improving AI policy*?

Related questions

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

Will there be a well accepted formal definition for honesty in AI by 2027?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #1: by 2027 will we have AI that doesn't hallucinate random nonsense?

Will a company lose ownership of an AI due to credible claims of the AI's possible sentience by the end of 2026?

Will AI regulations that include mechanisms for uncovering AI deception be adopted in the U.S. before 2035?

Will AI safety and regulation be mandatory training courses for students working with AI by the year 2035 under Federal Law?

Before 2028, will there be a major self-improving AI policy*?

© Manifold Markets, Inc.•Terms•Privacy