
Superposition is a hypothesized mechanism for polysemanticity. It is a major bottleneck for interpretability. There are groups working on reducing it, most notably Chris Olah's group at Anthropic. However, it is possible that reducing superposition is hard, or that superposition is not an accurate model of polysemanticity.
The following would qualify for a YES resolution:
A modified transformer architecture that, when trained, has at most 50% of the superposition than an iso-performance regular transformer
A method for reading out features in superposition from a regular/modified transformer that is able to recover at least 50% of features in superposition
The following would qualify for a (pre-2026) NO resolution:
Only a small fraction of features can be recovered (<50%)
Superposition is shown conclusively to be an invalid model of polysemanticity
In the event that it is unclear how many features are actually in superposition (there could hypothetically be an absurd number of near-orthogonal vectors), only preliminary (and not necessarily conclusive) evidence that the remaining possible directions are not relevant is sufficient to rule them out from consideration.