Related questions
@JacobPfau the title says "all variants runnable on 64GB", not "any variant runnable on 64GB" right?
@RemNi all it takes for this question to resolve as NO is one llama 3 variant that doesn't fit on 64GB. That seems highly plausible
@RemNi This question is about “any” model quantifying over parameter count. Per Daniel’s comment below, if the largest param count model has a quantized version that runs on 64gb, then it counts as running on 64gb.
One could phrase the question “will every param size of llama 3 admit quantization letting it run on 64gb?”
@RemNi Correct. If there is 1 model that cannot be quantized (>=2 bit) in 64gb, this question resolves to NO.
@DanielScott How about adding (>= 2bit quantized) to the end of the title?
"Will all Llama 3 variants be runnable with less than 64 gb of vram? (>= 2bit quantized)"
That might be clearer