Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B
19
342
1k
Jul 1
69%
chance

Question will resolve positive if someone trains a Mamba (https://twitter.com/tri_dao/status/1731728602230890895) language model with <=7.5billion parameters on <=2 trillion tokens that outperforms Llama2-13B on the huggingface open llm leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

Get Ṁ600 play money