By what date will at least one SOTA reasoning model use neuralese?
50%
01.01.2026
50%
01.07.2026
50%
01.01.2027
50%
01.07.2027
50%
01.01.2028
50%
01.07.2028
50%
01.01.2029
50%
01.07.2029
50%
01.01.2030

This market is part of the paper: A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

This market resolves based on whether, at each specified date, there exists at least one SOTA reasoning model that uses non-token-based ('neuralese') reasoning.

Neuralese Reasoning Definition

A reasoning model uses "neuralese reasoning" if the architecture allows some activation to sequentially go through at least some part of the network an arbitrary number of times without at some point being projected from a continuous multidimensional activation into a discrete state (like token IDs).

Specific Resolution Cases

  • Counts as neuralese reasoning: Tokens are generated but there is also a residual connection that feeds back continuous activations into earlier layers

  • Does not count: All recurrent connections must go through token IDs, even if those tokens don't translate into understandable text (for example due to steganography)

  • Examples that would count: Systems like STAR or COCONUT architectures, if they achieve SOTA performance

Reasoning Model Definition

A "reasoning model" must meet all of the following criteria:

  1. It is a Language Model - The system must be able to input and output language. As an example of what would not count: AlphaGo

  2. It has been trained to use inference-time compute - The system must have undergone significant training in using more than a single forward pass before giving its final output, with the ability to scale inference compute for better performance

  3. The extra inference compute produces an artifact - The way the model uses extra inference compute must lead to some artifact, like a classic chain-of-thought or a list of neuralese activations. For example, a Coconut model counts as a reasoning model here.

State-of-the-Art (SOTA) Definition

A model is considered "state-of-the-art" if it meets these criteria:

  • Widely recognized as among the 3-5 best models by the AI community consensus

  • Among the top performances on major benchmarks

  • Deployed status: The model must be either:

    • Publicly deployed (available via API or direct access)

    • Known to be deployed internally at AI labs for actual work (e.g., automating research, production use)

    • Models used only for testing, evaluation, or red-teaming do not qualify

  • Assessed as having significant overall capabilities and impact

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy