Will a Mamba-based LLM of GPT 3.5 quality or greater be open sourced in 2024?

Mamba is a next-generation architecture that seeks to improve on the shortcomings of transformers, mainly around context size and eliminating quadratic memory consumption during inference. https://arxiv.org/abs/2312.00752

YES resolution requires the Mamba LLM to match or beat GPT 3.5 on at least 5 popular benchmarks.

Get Ṁ600 play money
Sort by:

Here's a mamba-transformer-moe hybrid that's about as good as gpt-3.5. ai21.com/jamba.

bought Ṁ10 NO

Looks like Gemini 1.5 used normal transformers and not Mamba, while also seeming to get around these shortcomings (1M context size). I expect this will cause interest in Mamba to wane, which decreases the chance someone will bother training and testing a Mamba LLM to GPT 3.5 level.

@adele I remain unconvinced that transformer architecture will be the long term winner due to its compute and memory-hungry nature. These are great improvements though.