Will a satisfactory sub-quadratic attention operator be found to replace quadratic self-attention layers by 2026?

1.2kṀ586

2026

54%

chance

ALL

The current state-of-the-art Transformer architecture includes self-attention layers which have quadratic computational complexity in sequence length. This makes inference and training at longer sequence lengths infeasible, and is a limit on the architecture's capabilities.

There has been substantial research into sub-quadratic attention operators ever since the Transformer model was introduced, but so far none have proven to be full replacement for self-attention, usually due to reduced practical performance or even theoretical limits to its capacity.

For this market, my definition of "satisfactory" is a sub-quadratic attention operator that matches full self-attention's performance to the degree that it begins gaining widespread traction and begins being used in research papers not specifically focused on that operator. For example, I would consider RoPE and ALiBi (two positional embedding schemes, not attention operators) to have reached this stage.

Will a satisfactory sub-quadratic attention operator be found before 2026?

Technical AI Timelines

Science

Machine Learning

Language Models

Get

1,000

to start trading!

2 Comments

12 Holders

22 Trades

Sort by:

So far I haven't seen much use of Mamba's layers in anything but papers where using them is the main innovation.

I think I'd count Mamba's S6 layers for this, if they catch on and start being used in papers where Mamba's not the main novelty.

This might be controversial, because they're not really an attention operator like Linear Attention or others, since they require keeping track of a hidden state. On the other hand, S6 or even a whole Mamba block can be used as a layer in an otherwise normal Transformer, which is why I'm leaning towards it counting. Any thoughts?

People are also trading

Will the state-of-the-art AI model use latent space to reason by 2026?

15% chance

Will humans create a SOTA AI model without Multi-Layer Perceptrons by 2029?

39% chance

Will AI be Recursively Self Improving by mid 2026?

48% chance

People are also trading

Related questions