Will superposition in transformers be mostly solved by 2026?

2.2kṀ17k

2026

73%

chance

ALL

Superposition is a hypothesized mechanism for polysemanticity. It is a major bottleneck for interpretability. There are groups working on reducing it, most notably Chris Olah's group at Anthropic. However, it is possible that reducing superposition is hard, or that superposition is not an accurate model of polysemanticity.

The following would qualify for a YES resolution:

A modified transformer architecture that, when trained, has at most 50% of the superposition than an iso-performance regular transformer
A method for reading out features in superposition from a regular/modified transformer that is able to recover at least 50% of features in superposition

The following would qualify for a (pre-2026) NO resolution:

Only a small fraction of features can be recovered (<50%)
Superposition is shown conclusively to be an invalid model of polysemanticity

In the event that it is unclear how many features are actually in superposition (there could hypothetically be an absurd number of near-orthogonal vectors), only preliminary (and not necessarily conclusive) evidence that the remaining possible directions are not relevant is sufficient to rule them out from consideration.

AI Alignment

Mechanistic interpretability

Get

1,000

to start trading!

Comments

74 Holders

203 Trades