Will any public eval show 1 % intentional deception rate before 2026-12-31?

Question

This is clearly inspired by the kind of papers you see recently where any bug is cited as a model engaging in deception. great marketing for proprietary models, great buzz for the research publishers, but all of them assume any bug as deception. We will see more of evocative words such as 'deception', 'obfuscate', 'sabotage' when in reality it could be interpreted as bugs in the system which is usually the simplest explanation. So, the challenge for the evaluators is: will we see any public eval or research which can prove intentional deception at >1 % rate? The key here is intentionality has to be proven and not assumed. Why is a model engaging intentionally in deception would be a good question. But, before that, would be prudent to figure out how to prove intent here.

Manifold Markets · Answer

Roughly even odds — Manifold Markets prediction market estimates a 46% chance (2 traders, as of Aug 3, 2025).

People are also trading

Related questions