By 2027 will there be a well-accepted training procedure(s) for making AI honest?
18
26
460
2027
16%
chance
  • If there's serious concern that the procedure produces AI that's better at lying it doesn't count

  • Calibration can be part of the solution but the current work in that direction isn't enough

  • A procedure that seems promising but hasn't been subjected to very much scrutiny doesn't count

  • A formal proof that it's honest according to a well-accepted definition counts

  • No requirement that it be a "single" procedure in the sense of a single training loop

The idea here is to capture scenarios where we can't prove the procedure produces honest AI (perhaps because we haven't formalized that), but there's been extensive investigation and no one has found a way that the procedure obviously breaks/gets goodharted/etc (or perhaps it does but only on some odd edge cases)

Get Ṁ200 play money
Sort by:

Love to see number go down on all my safety markets while it goes up on all the capabilities ones. Good luck everyone.

More related questions