For the purpose of this bet, the superposition hypothesis is:
The activations of a trained GPT-like model can be understood as a compressed representation of sparse, interpretable features. The mapping from sparse to observed features is a linear function.
Relevant papers are among others:
https://transformer-circuits.pub/2022/toy_model/index.html
https://transformer-circuits.pub/2023/monosemantic-features/index.html
I will resolve this market using a poll, but some indications that the market should resolve YES include
the superposition assumption is frequently and successfully used in order to understand GPT-like models
no example has been found that clearly demonstrates the existence of features that are interpretable and functionally relevant but require more complex encoding/decoding functions
Indicators that the market should resolve NO include:
this area of research has been abandoned in favor of different approaches
counterexamples of interpretable non-linear features exist
Let me know if you have suggestions for better resolution criteria!
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ37 | |
| 2 | Ṁ20 | |
| 3 | Ṁ1 |
I resolved the question based on this poll: https://manifold.markets/NielsW/is-the-superposition-hypothesis-tru which at the time of resolution has 2 No vs 1 Yes vote. It is not great that the poll had so few votes, but I think resolving to 33% true is not too far away from my inside view: it is neither clearly false nor clearly correct, and looking through the individual YES/NO indicators I think also points at "a bit more close to false".
counterexamples of interpretable non-linear features exist