Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?
➕
Plus
28
Ṁ3876
2026
86%
chance
There are no restrictions on the amount or kind of compute used to *train* the model. Question is about whether it will actually be done, not whether it will be possible in theory. If I judge the model to really be many specific models stuck together to look like one general model it will not count.
Get
Ṁ1,000
and
S3.00
Sort by:

Llamas on pixel 7s https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support (ik ik its not over 13B yet, just sharing progress)

predicts YES

There are people who run 30B Llama on consumer PC successfully and even 65B (but it is extremely slow)

@ValeryCherepanov By "run on a single GPU" I mean the weights + one full input vector can fit on a consumer GPU at once. Otherwise the question would be meaningless - you can always split up matrices into smaller blocks and run the computation sequentially.

This is now extremely close to being resolved by Llama (Llama 13B does not actually beat GPT-3 on every measured benchmark, however, it only comes very close). 72% is way too low though so I guess whoever reads this comment first can collect some free mana in expectation.

@vluzko

Are the benchmarks text generation only? Or do they work with chat models too

FLAN-T5 3B very likely can resolve this now, but I suspect it will be a while before anyone actually bothers to run it on all of the benchmarks.

lol yeah this one's gonna happen: https://arxiv.org/abs/2205.05131

Tautologically this is possible today. Whether it’s possible to do “with entire model in GPU memory at once” just not that interesting to calculate.
I think it's a tight call. I think I'd go the other way if this was for 2027.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules