Size of smallest open-source LLM marching GPT 3.5's performance in 2025? (GB)
8
1kṀ2017
Dec 31
1.83 GB
expected
14%
Less than 0.49 GB
19%
0.5 - 0.99 GB
50%
1 - 1.99 GB
8%
2 - 3.99 GB
3%
4 - 5.99 GB
3%
6 - 8 GB
3%
Above 8 GB

Criteria for meeting GPT-3.5 is either ≥ 70% performance on MMLU (5-shot prompt is acceptable) or ≥ 35% performance on GPQA Diamond

Resolves to the amount of memory the open-source LLM takes up when run on an ordinary GPU. Only counting models that aren't fine-tuned directly on the task. Quantizations are allowed. Chain-of-thought prompting is allowed. Reasoning is allowed. For GPQA, giving examples is not allowed. For MMLU, maximum of 5 examples. Something like "an chatbot finetuned on math/coding reasoning problems" would be acceptable. I hold discretion in what counts, ask me if you have any concerns.

Global-MMLU and Global-MMLU Lite are considered acceptable substitutes for MMLU for the purposes of evaluation.

I will be ignoring statistical uncertainty and just use the headline figure unless the error is extremely large and like >5% and the uncertainty might matter.

Absent any specific measurement, I will be taking the model size in GB as the "amount of memory the open-source LLM takes up when run on an ordinary GPU", but if you can show that it takes up less in memory, I'll use that. If needed, I might run it on my own GPU and measure the memory usage on that. Quantizations count.

My definition of evidence is reasonably inclusive. For example, I would be happy accepting this Reddit post as evidence for the capabilities of quantized Gemma models.

Open-source is defined as "you can download the weights and run it on your GPU." For example, Llama models count as open-source.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy