Will Transformer based architectures still be SOTA for language modelling by 2026?

2.5kṀ5231

2026

80%

chance

ALL

The intent is to capture whether there is a major paradigm shift, on the same order of magnitude of RNNs to Transformers. The market resolves based on the architecture of the SOTA language model. If the architecture is still recognizable as a transformer with modifications, this market resolves as Yes. If the architecture is at least as different from Transformers as Transformers are from RNNs then it resolves No. If the evaluation numbers are ambiguous which architecture is the SOTA due to incomparable evaluations or non Pareto improvement over all evaluations, but one is obviously better overall, then I will use my judgement. If I deem it too close to call, I will resolve this market as Yes, because it indicates that no architecture has clearly surpassed Transformers.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Will the most capable, public multimodal model at the end of 2027 in my judgement use a transformer-like architecture?

63% chance

Will a transformer based model be SOTA for video generation by the end of 2025?

82% chance

Will the transformer architecture be replaced in SOTA LLMs by 2028?

61% chance

By 2030, will large language models still be at the peak of AI? [DRAFT]

25% chance

Will Transformer-Based LLMs Make Up ≥75% of Parameters in the Top General AI by 2030?

50% chance

13% chance

Will any 10 trillion+ parameter language model that follows instructions be released to the public before 2026?

32% chance

Best available language model from an OpenAI competitor by 2026

73% chance

By EOY 2025, will the model with the lowest perplexity on Common Crawl will not be based on transformers?

10% chance

By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?

Sort by:

I bet yes because I measure SOTA by frontier models, and I would be surprised if the labs were to invest the resources required for a large scale training run on an architecture which was new.

I like this question, added some liquidity

Is Mamba a transformer?

Would you count a model like Stanford's Monarch Mixer as a Transformer?

Is an ARDM (auto-regressive diffusion model) a transformer?

predictedYES

@ampdot Probably not; for a more definite answer you would have to elaborate more on what you have in mind.

Yes.

Plus memory and retrieval.

And hierarchical/nested/capsules (already used in video and high-end competition for long prompts)

And mixture of experts, and flash-attention and other sparsity techniques.

Will Transformer based architectures still be SOTA for language modelling by 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition