Will every param size of llama 3 admit quantization letting it run on 64gb? (>= 2bit quantized)
resolved Jun 12

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
Sort by:
bought Ṁ20 YES

For 2bit quant to not be runnable on 64, the largest LLAMA-3 needs to be ~256B. According to Manifolders this is highly likely to come out next month. >60% of 256B+, seems off.

@JacobPfau the title says "all variants runnable on 64GB", not "any variant runnable on 64GB" right?

bought Ṁ30 NO

@RemNi all it takes for this question to resolve as NO is one llama 3 variant that doesn't fit on 64GB. That seems highly plausible

@RemNi This question is about “any” model quantifying over parameter count. Per Daniel’s comment below, if the largest param count model has a quantized version that runs on 64gb, then it counts as running on 64gb.

One could phrase the question “will every param size of llama 3 admit quantization letting it run on 64gb?”

@RemNi Correct. If there is 1 model that cannot be quantized (>=2 bit) in 64gb, this question resolves to NO.

@DanielScott How about adding (>= 2bit quantized) to the end of the title?

"Will all Llama 3 variants be runnable with less than 64 gb of vram? (>= 2bit quantized)"

That might be clearer

@JacobPfau excellent wording. I've updated the question. Thank you.

I don't like spreading misconceptions, so to point out what I had missed in my initial comment: I think this market is roughly correctly priced because Llama 3 will likely, according to manifold, be MoE and therefore be memory-inefficient.

Any of the variants? Also, fp16, int8, or int4?

predicted NO

@HanchiSun >= 2bit quants