Will Mamba be the de-facto paradigm for LLMs over transformers by 2025?

1kṀ12k

resolved Jan 1

Resolved

ALL

The question will resolve yes if atleast 50% of the major AI labs (OpenAI, Google Deepmind, Anthropic, Meta, and Eluether) use Mamba in their flagship SOTA model.

Technology

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ180
2		Ṁ160
3		Ṁ129
4		Ṁ49
5		Ṁ39

People are also trading

LLMs by EOY 2025: Will Retentive Learning Surpass Transformers? (Subsidised 400 M$)

14% chance

Will transformers still be the dominant DL architecture in 2026?

80% chance

What will be true of OpenAI's best LLM by EOY 2025?

Will a Mamba-based LLM of GPT 3.5 quality or greater be open sourced in 2024?

73% chance

When will a non-Transformer model become the top open source LLM?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

11% chance

Will LLMs become a ubiquitous part of everyday life by June 2026?

82% chance

There will be one LLM/AI that is at least 10x better than all others in 2027

17% chance

Will xAI develop a more capable LLM than GPT-5 by 2026

Sort by:

how does this resolve if the the major AI labs’ flagship models use a transformer/mamba hybrid?

@brunoedwards Is there any evidence that a major lab is using this architecture? Also I think @Benthamite provides a good criteria for what should be considered a "Mamba" model in his comment.

@AndrewImpellitteri https://www.ai21.com/jamba

Happy to be wrong on this, and help Neel Nanda develop MambaLens, but I think Mamba is GPU-specific so would be outperformed by e.g. custom hardware that runs transformer blocks quickly.

How do you define "Mamba"? Is it "H3 + MLP blocks"? What if it's some other SSM block + MLP? What if it doesn't have the special hardware-aware algorithm?

To me, a more interesting question is whether some sort of SSM will become dominant, rather than the specific mamba algorithm in particular, but maybe that's a different question.

@Benthamite All state space models or other candidates are good for the following question:

@Benthamite

This is a a fair point.

I think any SSM block + MLP + the hardware-aware algorithm are the minimum conditions to call a model "Mamba-like." But if there are any disagreements to this view, I would love to hear them before the market resolves.

@AndrewImpellitteri My tentative guess is that requiring the hardware-aware algorithm as part of the resolution criteria is going to lead to a bunch of annoying judgment calls about whether some future algorithm is close enough to what's in the mamba paper to count as "the same".

Maybe a key hallmark is something like "key operations are designed to fit in SRAM"?

I would propose the following criteria for considering a block to be mamba-like:

1. There are 1+ SSM and 1+ MLP blocks

2. The SSM is somehow selective (i.e. input-dependent)

3. Some key operation (e.g. recurrence) is designed to fit in GPU SRAM