Will Mamba be the de-facto paradigm for LLMs over transformers by 2025?

1kṀ12k

resolved Jan 1

Resolved

ALL

The question will resolve yes if atleast 50% of the major AI labs (OpenAI, Google Deepmind, Anthropic, Meta, and Eluether) use Mamba in their flagship SOTA model.

Technology

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ180
2		Ṁ160
3		Ṁ129
4		Ṁ49
5		Ṁ39

People are also trading

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

❓ Which AI model will lead the LLM race by the end of 2025?

🧠 Which LLM will have the most real-world commercial usage by the end of 2025?

LLMs by EOY 2025: Will Retentive Learning Surpass Transformers? (Subsidised 400 M$)

10% chance

Will Transformer-Based LLMs Make Up ≥75% of Parameters in the Top General AI by 2030?

50% chance

Will transformers still be the dominant DL architecture in 2026?

81% chance

Will LLMs become a ubiquitous part of everyday life by June 2026?

82% chance

What will be true of OpenAI's best LLM by EOY 2025?

When will a non-Transformer model become the top open source LLM?

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Sort by:

how does this resolve if the the major AI labs’ flagship models use a transformer/mamba hybrid?

@brunoedwards Is there any evidence that a major lab is using this architecture? Also I think @Benthamite provides a good criteria for what should be considered a "Mamba" model in his comment.

@AndrewImpellitteri https://www.ai21.com/jamba

Happy to be wrong on this, and help Neel Nanda develop MambaLens, but I think Mamba is GPU-specific so would be outperformed by e.g. custom hardware that runs transformer blocks quickly.

How do you define "Mamba"? Is it "H3 + MLP blocks"? What if it's some other SSM block + MLP? What if it doesn't have the special hardware-aware algorithm?

To me, a more interesting question is whether some sort of SSM will become dominant, rather than the specific mamba algorithm in particular, but maybe that's a different question.

@Benthamite All state space models or other candidates are good for the following question:

@Benthamite

This is a a fair point.

I think any SSM block + MLP + the hardware-aware algorithm are the minimum conditions to call a model "Mamba-like." But if there are any disagreements to this view, I would love to hear them before the market resolves.

@AndrewImpellitteri My tentative guess is that requiring the hardware-aware algorithm as part of the resolution criteria is going to lead to a bunch of annoying judgment calls about whether some future algorithm is close enough to what's in the mamba paper to count as "the same".

Maybe a key hallmark is something like "key operations are designed to fit in SRAM"?

I would propose the following criteria for considering a block to be mamba-like:

1. There are 1+ SSM and 1+ MLP blocks

2. The SSM is somehow selective (i.e. input-dependent)

3. Some key operation (e.g. recurrence) is designed to fit in GPU SRAM