In what years will CoT monitoring fail for frontier AI?
3
175Ṁ60
2030
50%
2027
50%
2028-2029
50%
2030-2035
50%
2035-2040
34%
2040-2050
24%
2025
24%
2026

Resolves YES for a year or year range if there is a frontier AI model released within that range whose chain of thought (CoT) cannot be trusted to reveal when it is misaligned (i.e. intentionally misbehaving).

The assumption for models released that use CoT is that they are CoT monitorable, evidence would need to be provided demonstrating this assumption is false (e.g. a peer reviewed paper or highly endorsed LessWrong post). A frontier model released that always acts without CoT or anything similar would trigger a YES resolution for the time period of its release.

If all frontier models are released in the time period feature CoT and their monitorability is not disproven, then that time period resolves NO.

If in the event no frontier models are released during a time period, that period resolves N/A, as I am interested specifically in when CoT monitoring fails as an oversight / AI control technique.

Get
Ṁ1,000
to start trading!
Sort by:

Trustworthy CoT isn't binary; it's already frequently unfaithful.

@Haiku It's not 100% faithful, but it's good enough that we can use it to detect misalignment (https://arxiv.org/abs/2507.11473v1).

Things on this market would resolve NO if AI safety researchers broadly agreed it was no longer feasible to do this.

© Manifold Markets, Inc.TermsPrivacy