In what years will CoT monitoring fail for frontier AI?

175Ṁ132

2030

52%

2028-2029

52%

2030-2035

46%

2035-2040

46%

2040-2050

35%

2027

20%

2026

2025

Resolves YES for a year or year range if there is a frontier AI model released within that range whose chain of thought (CoT) cannot be trusted to reveal when it is misaligned (i.e. intentionally misbehaving).

The assumption for models released that use CoT is that they are CoT monitorable, evidence would need to be provided demonstrating this assumption is false (e.g. a peer reviewed paper or highly endorsed LessWrong post). A frontier model released that always acts without CoT or anything similar would trigger a YES resolution for the time period of its release.

If all frontier models are released in the time period feature CoT and their monitorability is not disproven, then that time period resolves NO.

If in the event no frontier models are released during a time period, that period resolves N/A, as I am interested specifically in when CoT monitoring fails as an oversight / AI control technique.

AI Impacts

Technical AI Timelines

AI Safety

AI control

Get

1,000

to start trading!

3 Comments

6 Holders

15 Trades

Sort by: