Will Transformer based architectures still be SOTA for language modelling by 2026?

2.5kṀ18k

Jan 2

97%

chance

ALL

The intent is to capture whether there is a major paradigm shift, on the same order of magnitude of RNNs to Transformers. The market resolves based on the architecture of the SOTA language model. If the architecture is still recognizable as a transformer with modifications, this market resolves as Yes. If the architecture is at least as different from Transformers as Transformers are from RNNs then it resolves No. If the evaluation numbers are ambiguous which architecture is the SOTA due to incomparable evaluations or non Pareto improvement over all evaluations, but one is obviously better overall, then I will use my judgement. If I deem it too close to call, I will resolve this market as Yes, because it indicates that no architecture has clearly surpassed Transformers.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

By the start of 2026, will I still think that transformers are the main architecture for tasks related to natural language processing?

90% chance

Will transformers still be the dominant DL architecture in 2026?

91% chance

Will a transformer based model be SOTA for video generation by the end of 2025?

82% chance

13% chance

Will any 10 trillion+ parameter language model that follows instructions be released to the public before 2026?

10% chance

Best available language model from an OpenAI competitor by 2026

76% chance

By EOY 2025, will the model with the lowest perplexity on Common Crawl will not be based on transformers?

5% chance

Will the most capable, public multimodal model at the end of 2027 in my judgement use a transformer-like architecture?

63% chance

Will AI (large language models) collapse by may 2026?

11% chance

Will the transformer architecture be replaced in SOTA LLMs by 2028?

Sort by:

I bet yes because I measure SOTA by frontier models, and I would be surprised if the labs were to invest the resources required for a large scale training run on an architecture which was new.

I like this question, added some liquidity

Is Mamba a transformer?

Would you count a model like Stanford's Monarch Mixer as a Transformer?

Is an ARDM (auto-regressive diffusion model) a transformer?

predictedYES

@ampdot Probably not; for a more definite answer you would have to elaborate more on what you have in mind.

Yes.

Plus memory and retrieval.

And hierarchical/nested/capsules (already used in video and high-end competition for long prompts)

And mixture of experts, and flash-attention and other sparsity techniques.

Will Transformer based architectures still be SOTA for language modelling by 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition