Will Mamba be the de-facto paradigm for LLMs over transformers by 2025?


The question will resolve yes if atleast 50% of the major AI labs (OpenAI, Google Deepmind, Anthropic, Meta, and Eluether) use Mamba in their flagship SOTA model.

Get Ṁ600 play money
Sort by:

Happy to be wrong on this, and help Neel Nanda develop MambaLens, but I think Mamba is GPU-specific so would be outperformed by e.g. custom hardware that runs transformer blocks quickly.

How do you define "Mamba"? Is it "H3 + MLP blocks"? What if it's some other SSM block + MLP? What if it doesn't have the special hardware-aware algorithm?

To me, a more interesting question is whether some sort of SSM will become dominant, rather than the specific mamba algorithm in particular, but maybe that's a different question.

@Benthamite All state space models or other candidates are good for the following question:


This is a a fair point.

I think any SSM block + MLP + the hardware-aware algorithm are the minimum conditions to call a model "Mamba-like." But if there are any disagreements to this view, I would love to hear them before the market resolves.

@AndrewImpellitteri My tentative guess is that requiring the hardware-aware algorithm as part of the resolution criteria is going to lead to a bunch of annoying judgment calls about whether some future algorithm is close enough to what's in the mamba paper to count as "the same".

Maybe a key hallmark is something like "key operations are designed to fit in SRAM"?

I would propose the following criteria for considering a block to be mamba-like:

1. There are 1+ SSM and 1+ MLP blocks

2. The SSM is somehow selective (i.e. input-dependent)

3. Some key operation (e.g. recurrence) is designed to fit in GPU SRAM

More related questions