Will superposition in transformers be mostly solved by 2026?

2.2kṀ17k

2026

73%

chance

ALL

Superposition is a hypothesized mechanism for polysemanticity. It is a major bottleneck for interpretability. There are groups working on reducing it, most notably Chris Olah's group at Anthropic. However, it is possible that reducing superposition is hard, or that superposition is not an accurate model of polysemanticity.

The following would qualify for a YES resolution:

A modified transformer architecture that, when trained, has at most 50% of the superposition than an iso-performance regular transformer
A method for reading out features in superposition from a regular/modified transformer that is able to recover at least 50% of features in superposition

The following would qualify for a (pre-2026) NO resolution:

Only a small fraction of features can be recovered (<50%)
Superposition is shown conclusively to be an invalid model of polysemanticity

In the event that it is unclear how many features are actually in superposition (there could hypothetically be an absurd number of near-orthogonal vectors), only preliminary (and not necessarily conclusive) evidence that the remaining possible directions are not relevant is sufficient to rule them out from consideration.

AI Alignment

Mechanistic interpretability

Get

1,000

to start trading!

People are also trading

Will transformers still be the dominant DL architecture in 2026?

81% chance

Will Transformer based architectures still be SOTA for language modelling by 2026?

91% chance

Will a transformer based model be SOTA for video generation by the end of 2025?

82% chance

Is superposition true?

46% chance

By the start of 2026, will I still think that transformers are the main architecture for tasks related to natural language processing?

90% chance

Will we solve AI alignment by 2026?

2% chance

My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues

59% chance

On January 1, 2027, a Transformer-like model will continue to hold the state-of-the-art position in most benchmark

84% chance

Will the most capable, public multimodal model at the end of 2027 in my judgement use a transformer-like architecture?

63% chance

Will the transformer architecture be replaced in SOTA LLMs by 2028?

Sort by:

🤨🤨🤨🤨🤨🤨🤨🤨🤨🤨🤨

As a clarification: the method should have to also demonstrably meet the 50% criterion for at least transformers of nontrivial size (GPT-2 as a lower bound), and it should appear plausible that it will scale to frontier transformers (for example, a scaling law demonstrating continued improvement would satisfy this condition). So a one layer transformer will not qualify. I think this is the most natural interpretation of the title--"superposition in transformers" implies transformers in some degree of generality.

@LeoGao Also, additional clarification: >50% variance explained by an autoencoder will not qualify for the >50% of features requirement

I feel like I am missing something important here

@firstuserhere It was linked in https://twitter.com/daniel_271828/status/1710437181234540925

Why... is this spiking? Is this Chris Olah's comments about being excited about interp, and it being more of an engineering problem, in his opinion?

predictedYES

Is this for any transformer? How does this resolve if we have an expensive technique that has been validated on small transformers but hasn't been successfully applied to very large transformers?

Cases I'm interested in:
- It satisfies the market criteria for at least one small transformer and it's reasonable to think the best technique in 2026 would work on large transformers if we had really good hardware we currently don't have
- It satisfies the criteria with a small transformer and it's reasonable to think it would work on large transformers but it would be expensive and noone's tried it yet.
- It satisfies the criteria with a small transformer and preliminary results for larger transformers are mixed/don't satisfy criteria of market.

where small is something between ~8M-1B parameters

Noa Nabeshima boughtṀ200YES

@NoaNabeshima elaborate the excitement?

predictedYES

@BartholomewHughes I didn't think carefully about the actual probability, I think I'm not trying to be a very good predictor on this market fwiw. I've been doing some superposition stuff with some promising early results and attending to public stuff. My main story for this resolving Yes is that Anthropic succeeds. I think trading against me isn't unreasonable. Part what's going on here for me is: just enjoying the feeling of being bullish and (?) incentive to do a good job (seems silly but that's what it's actually like for me)

predictedYES

@BartholomewHughes also 3.5 years is a long time

50% of features in what sized model?

Interesting question! It honestly wouldn't surprise me if SoLU has at most 50% of the superposition of a normal model, though it's really hard to quantify. My guess is that removing superposition is impossible, but that being able to recover many features is doable-ish, though 50% is a high bar. My best guess for this breaking is just that we never figure out how to quantify the number of features.