Will Mamba be the de-facto paradigm for LLMs over transformers by 2025?
Will Mamba be the de-facto paradigm for LLMs over transformers by 2025?
40
1kṀ12k
resolved Jan 1
Resolved
NO

https://arxiv.org/abs/2312.00752

The question will resolve yes if atleast 50% of the major AI labs (OpenAI, Google Deepmind, Anthropic, Meta, and Eluether) use Mamba in their flagship SOTA model.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ180
2Ṁ160
3Ṁ129
4Ṁ49
5Ṁ39


Sort by:
3mo

how does this resolve if the the major AI labs’ flagship models use a transformer/mamba hybrid?

@brunoedwards Is there any evidence that a major lab is using this architecture? Also I think @Benthamite provides a good criteria for what should be considered a "Mamba" model in his comment.

1y

Happy to be wrong on this, and help Neel Nanda develop MambaLens, but I think Mamba is GPU-specific so would be outperformed by e.g. custom hardware that runs transformer blocks quickly.

1y

How do you define "Mamba"? Is it "H3 + MLP blocks"? What if it's some other SSM block + MLP? What if it doesn't have the special hardware-aware algorithm?

To me, a more interesting question is whether some sort of SSM will become dominant, rather than the specific mamba algorithm in particular, but maybe that's a different question.

1y

@Benthamite All state space models or other candidates are good for the following question:

@Benthamite

This is a a fair point.

I think any SSM block + MLP + the hardware-aware algorithm are the minimum conditions to call a model "Mamba-like." But if there are any disagreements to this view, I would love to hear them before the market resolves.

1y

@AndrewImpellitteri My tentative guess is that requiring the hardware-aware algorithm as part of the resolution criteria is going to lead to a bunch of annoying judgment calls about whether some future algorithm is close enough to what's in the mamba paper to count as "the same".

Maybe a key hallmark is something like "key operations are designed to fit in SRAM"?

I would propose the following criteria for considering a block to be mamba-like:

1. There are 1+ SSM and 1+ MLP blocks

2. The SSM is somehow selective (i.e. input-dependent)

3. Some key operation (e.g. recurrence) is designed to fit in GPU SRAM

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules