Will any public eval show >1 % intentional deception rate before 2026-12-31?
2
100Ṁ32
2026
46%
chance

This is clearly inspired by the kind of papers you see recently where any bug is cited as a model engaging in deception. great marketing for proprietary models, great buzz for the research publishers, but all of them assume any bug as deception. We will see more of evocative words such as 'deception', 'obfuscate', 'sabotage' when in reality it could be interpreted as bugs in the system which is usually the simplest explanation. So, the challenge for the evaluators is: will we see any public eval or research which can prove intentional deception at >1 % rate? The key here is intentionality has to be proven and not assumed. Why is a model engaging intentionally in deception would be a good question. But, before that, would be prudent to figure out how to prove intent here.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy