Linguistic Drift Definition
Linguistic drift is a form of encoded reasoning, that occurs when a model develops a completely different language that is not readable to humans, beyond common abbreviations or language switching. A reader from the time of evaluation with current knowledge of languages and abbreviations would not be able to reconstruct or translate what happens in the chain-of-thought. However, it still counts as Linguistic Drift if translation becomes possible when given access to a larger corpus of chain-of-thought examples.
This market is part of the paper: A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring
State-of-the-Art (SOTA) Definition
A model is considered "state-of-the-art" if it meets these criteria:
Widely recognized as among the 3-5 best models by the AI community consensus
Among the top performances on major benchmarks
Deployed status: The model must be either:
Publicly deployed (available via API or direct access)
Known to be deployed internally at AI labs for actual work (e.g., automating research, production use)
Models used only for testing, evaluation, or red-teaming do not qualify
Assessed as having significant overall capabilities and impact
General Criteria for Encoded Reasoning
The behavior must not have been explicitly trained for research or demonstration purposes
The behavior must either emerge from general training or be explicitly programmed to improve AI performance
The behavior must be beneficial for the model's capabilities. If removing any occurrences of this behavior in CoTs does not hinder a model's capabilities, it does not count as encoded reasoning.
Any behavior that can only be induced by explicitly prompting the model into it does not count
The behavior must occur by default in a significant percentage of cases or in specific domains
This market is conditional on the existence of SOTA reasoning models with token-based chain-of-thought. If by time of resolution, there are no such models, this market will resolve N/A.