Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B | Manifold

Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B

21

1kṀ738

Jul 1

66%

chance

1H

6H

1D

1W

1M

ALL

Question will resolve positive if someone trains a Mamba (https://twitter.com/tri_dao/status/1731728602230890895) language model with <=7.5billion parameters on <=2 trillion tokens that outperforms Llama2-13B on the huggingface open llm leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

Get

1,000

to start trading!

Sort by:

https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-4k

People are also trading

Will anyone train a TokenFormer model at scale before 2026?

Will Llama 4 be the best LLM in the chatbot arena?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

Will the next major LLM by OpenAI use a new tokenizer?

Meta trains model 2x larger than Behemoth in llama 4 series?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?

How many active parameters will the largest Llama 3 have?

Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)

Will the next LLM released by OpenAI be worse than GPT-4 at MMLU?

Related questions

Will anyone train a TokenFormer model at scale before 2026?

Will Llama 4 be the best LLM in the chatbot arena?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

Will the next major LLM by OpenAI use a new tokenizer?

Meta trains model 2x larger than Behemoth in llama 4 series?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?

How many active parameters will the largest Llama 3 have?

Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)

Will the next LLM released by OpenAI be worse than GPT-4 at MMLU?

© Manifold Markets, Inc.•Terms•Privacy