Daniel Kokotajlo makes the following prediction:
Sometime in the next few years probably, various researchers will discover:
* That if you scale up RL by additional OOMs, the CoTs evolve into some alien optimized language for efficiency reasons
* That you can train models to think in neuralese of some sort (e.g. with recurrence, or more high-dimensional outputs at least besides tokens) to boost performance.
Resolves YES if at least three papers agreeing with at least one of the above bulleted claims are published before market close.
Update 2025-03-15 (PST) (AI summary of creator comment): Clarification Details
Uninterpretable means the chain-of-thought must be illegible-by-reading-alone.
Results that can be decoded by training an auxiliary model do not count toward meeting this criterion.
@EthanKuntz For the purpose of resolution I’ll consider just illegible-by-reading-alone, even if it’s possible to train an auxiliary model to decode it or something.