Is superposition true?
7
69
130
2025
59%
chance

For the purpose of this bet, the superposition hypothesis is:

The activations of a trained GPT-like model can be understood as a compressed representation of sparse, interpretable features. The mapping from sparse to observed features is a linear function.

Relevant papers are among others:

https://transformer-circuits.pub/2022/toy_model/index.html

https://transformer-circuits.pub/2023/monosemantic-features/index.html

https://www.alignmentforum.org/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition

I will resolve this market using a poll, but some indications that the market should resolve YES include

  • the superposition assumption is frequently and successfully used in order to understand GPT-like models

  • no example has been found that clearly demonstrates the existence of features that are interpretable and functionally relevant but require more complex encoding/decoding functions

Indicators that the market should resolve NO include:

  • this area of research has been abandoned in favor of different approaches

  • counterexamples of interpretable non-linear features exist

Let me know if you have suggestions for better resolution criteria!

Get Ṁ600 play money