Is attention all you need? (transformers SOTA in 2027)

This market simulates the wager between Jonathan Frankie (@jefrankle) and Sasha Rush (@srush_nlp)

On January 1, 2027, a Transformer-like model will continue to hold the state-of-the-art position in most benchmarked tasks in natural language processing.

Yes given that an architecture qualifies that levers a combination of transformer models and supporting infra components that wouldn’t be considered breakthrough technologies on their own (e.g. RAG).

So do mixtures of experts count? The linked page this not contain any actual details.

@EchoNolan I talked to Sasha, and his response is basically that as long as the E in the MoE is Transformer, its a transformer.

