Will Llama 3 use Mixture of Experts?

29

1kṀ6035

resolved Jul 30

Resolved

NO

1H

6H

1D

1W

1M

ALL

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ225
2		Ṁ162
3		Ṁ141
4		Ṁ84
5		Ṁ68

People are also trading

Is GPT-5 a mixture of experts?

Will there exist a service for full-parameter fine-tuning of Llama 3.1 405B?

How many active parameters will the largest Llama 3 have?

Is gpt-3.5-turbo a Mixture of Experts (MoE)?

Will Llama-3 (or next open Meta model) be obviously good in its first-order effects on the world?

Sort by:

Llama 405b was released on 23 July, and its dense model. Account of market autor is deleted. Could moderator(or its equivalents on Manifold) resolve the market?

let's wait for this

In case both dense and MoE are released under the name of llama 3, I am leaning towards resolving to the architecture that the BEST model uses (lmsys arena)

@Sss19971997 If both dense and MoE are released, I think it should resolve YES.

@ErikBjareholt If they have 640B dense and 16B moe, seems wrong to resolve to MoE

predictedYES

@Sss19971997 Perhaps, but more likely we'll see 8x7B MoE (like Mixtral) and also a 70B dense model.

In that case, do you think this should resolve no?

@ErikBjareholt Depending on the performance. Very likely 8x7b will be worse than a 70B.

Foundational models are not MoE right? MoE is a technique to increase throughput of foundational models.

@quantizor Wrong. MoE gain most benefit from pretraining

@quantizor MoE is applied during pretraining, like a very smart dropout

@HanchiSun Arbitrage opportunity

People are also trading

Is GPT-5 a mixture of experts?

Will there exist a service for full-parameter fine-tuning of Llama 3.1 405B?

How many active parameters will the largest Llama 3 have?

Is gpt-3.5-turbo a Mixture of Experts (MoE)?

Will Llama-3 (or next open Meta model) be obviously good in its first-order effects on the world?

Related questions

Is GPT-5 a mixture of experts?

Will there exist a service for full-parameter fine-tuning of Llama 3.1 405B?

How many active parameters will the largest Llama 3 have?

Is gpt-3.5-turbo a Mixture of Experts (MoE)?

Will Llama-3 (or next open Meta model) be obviously good in its first-order effects on the world?

© Manifold Markets, Inc.•Terms•Privacy