AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?
17
1kṀ1484
2027
48%
chance

I don't have a clear definition of "deceptive", I think that's part of the challenge.

Edit: By "part of the challenge" I mean that this market is asking both if a clear definition of "deceptive" will be published and if tools to detect that will be created. I will be fairly lax about what counts as a good formalization - if it captures even 40% of what we generally think of as "deceptive" that would count.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy