Will Llama 3 use Mixture of Experts?
27
277
530
2026
10%
chance

Get Ṁ200 play money
Sort by:

let's wait for this

In case both dense and MoE are released under the name of llama 3, I am leaning towards resolving to the architecture that the BEST model uses (lmsys arena)

@Sss19971997 If both dense and MoE are released, I think it should resolve YES.

@ErikBjareholt If they have 640B dense and 16B moe, seems wrong to resolve to MoE

predicts YES

@Sss19971997 Perhaps, but more likely we'll see 8x7B MoE (like Mixtral) and also a 70B dense model.

In that case, do you think this should resolve no?

@ErikBjareholt Depending on the performance. Very likely 8x7b will be worse than a 70B.

Foundational models are not MoE right? MoE is a technique to increase throughput of foundational models.

@quantizor Wrong. MoE gain most benefit from pretraining

@quantizor MoE is applied during pretraining, like a very smart dropout

@HanchiSun Arbitrage opportunity