Will it be possible to fine-tune a 65B parameter model with 30GB of GPU memory (average) by the end of 2023?
14
290Ṁ299
resolved Mar 10
Resolved
N/A

QLoRA reduced the avg memory requirements from 750+ GB to < 48 GB of GPU memory (average) for a 65B model.

They checked by training 1000 models across several different instruction sets + architectures + parameter ranges [80M, 65B].

Will it be possible to reduce it further? Not just on 1 model but reliably, need consistent and compelling evidence.

Get
Ṁ1,000
to start trading!
Sort by:

This market needs clarification regarding time to finetune, and finetuned model performance, required for something to count as "finetuning".

Otherwise, I can trivially finetune even a 1T parameter model with zero gpus, because finetuning a model is a computational operation and regular non-gpu computers are Turing-complete.

what is “48 GB of GPU time”?

@NiciusB they fxd description

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules