Will Transformer based architectures still be SOTA for language modelling by 2026?
44
258
1.5K
2026
67%
chance

The intent is to capture whether there is a major paradigm shift, on the same order of magnitude of RNNs to Transformers. The market resolves based on the architecture of the SOTA language model. If the architecture is still recognizable as a transformer with modifications, this market resolves as Yes. If the architecture is at least as different from Transformers as Transformers are from RNNs then it resolves No. If the evaluation numbers are ambiguous which architecture is the SOTA due to incomparable evaluations or non Pareto improvement over all evaluations, but one is obviously better overall, then I will use my judgement. If I deem it too close to call, I will resolve this market as Yes, because it indicates that no architecture has clearly surpassed Transformers.

Get Ṁ200 play money
Sort by:

Would you count a model like Stanford's Monarch Mixer as a Transformer?

bought Ṁ10 of NO

Is an ARDM (auto-regressive diffusion model) a transformer?

predicts YES

@ampdot Probably not; for a more definite answer you would have to elaborate more on what you have in mind.

Yes.

Plus memory and retrieval.

And hierarchical/nested/capsules (already used in video and high-end competition for long prompts)

And mixture of experts, and flash-attention and other sparsity techniques.

Will Transformer based architectures still be SOTA for language modelling by 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition