Will it be possible to fine-tune a 65B parameter model with 30GB of GPU memory (average) by the end of 2023? | Manifold

Will it be possible to fine-tune a 65B parameter model with 30GB of GPU memory (average) by the end of 2023?

14

Ṁ290Ṁ299

resolved Mar 10

Resolved

N/A

1H

6H

1D

1W

1M

ALL

QLoRA reduced the avg memory requirements from 750+ GB to < 48 GB of GPU memory (average) for a 65B model.

They checked by training 1000 models across several different instruction sets + architectures + parameter ranges [80M, 65B].

Will it be possible to reduce it further? Not just on 1 model but reliably, need consistent and compelling evidence.

Market context

Technical AI Timelines

New Year's Resolutions 2024

Get

1,000

to start trading!

People are also trading

Will GigaChat release an open-weights model with ≥100B parameters by the end of 2026?

1GW AI training run before 2027?

Before 2028, will a GPU the same or smaller die size as b100 achieve 2x or better max throughput on GPT-oss-120b?

Will Aidan McLau's claim that very large models are "refusing instruction tuning" be validated by 2030?

Will a GPT-4 quality model be trained for under $10.000 by 2030?

How fast will you be able to train a GPT-2-level AI on a consumer GPU in 2030?

Open "Nano Banana Pro"‑Level Model on a Gaming GPU by 2028?

Will a major cosmological simulation be AI-accelerated by the end of 2027?

100GW AI training run before 2031?

AI model training time decreases fourfold by mid-2027?

Sort by:

This market needs clarification regarding time to finetune, and finetuned model performance, required for something to count as "finetuning".

Otherwise, I can trivially finetune even a 1T parameter model with zero gpus, because finetuning a model is a computational operation and regular non-gpu computers are Turing-complete.

Does https://arxiv.org/abs/2305.17333 count?

what is “48 GB of GPU time”?

@NiciusB they fxd description

People are also trading

Will GigaChat release an open-weights model with ≥100B parameters by the end of 2026?

1GW AI training run before 2027?

Before 2028, will a GPU the same or smaller die size as b100 achieve 2x or better max throughput on GPT-oss-120b?

Will Aidan McLau's claim that very large models are "refusing instruction tuning" be validated by 2030?

Will a GPT-4 quality model be trained for under $10.000 by 2030?

How fast will you be able to train a GPT-2-level AI on a consumer GPU in 2030?

Open "Nano Banana Pro"‑Level Model on a Gaming GPU by 2028?

Will a major cosmological simulation be AI-accelerated by the end of 2027?

100GW AI training run before 2031?

AI model training time decreases fourfold by mid-2027?

Related questions

Will GigaChat release an open-weights model with ≥100B parameters by the end of 2026?

1GW AI training run before 2027?

Before 2028, will a GPU the same or smaller die size as b100 achieve 2x or better max throughput on GPT-oss-120b?

Will Aidan McLau's claim that very large models are "refusing instruction tuning" be validated by 2030?

Will a GPT-4 quality model be trained for under $10.000 by 2030?

How fast will you be able to train a GPT-2-level AI on a consumer GPU in 2030?

Open "Nano Banana Pro"‑Level Model on a Gaming GPU by 2028?

Will a major cosmological simulation be AI-accelerated by the end of 2027?

100GW AI training run before 2031?

AI model training time decreases fourfold by mid-2027?