Will there exist a service for full-parameter fine-tuning of Llama 3.1 405B?
➕
Plus
24
18k
2026
80%
chance

Resolves YES if a service or API exists before 2026 which does the following:

  1. You upload text document(s) for fine-tuning

  2. The service does full parameter fine-tuning on Llama 3.1 405B, without you having to rent GPUs

  3. You can download the fine-tuned model

Extra Details:

  • I’ll accept a service which finetunes a model other than Llama 3.1 405B, so long as it is more capable (as judged by e.g. the lmsys leaderboard) and has at least 400B parameters. E.g. if Meta releases Llama 4 405B, that counts

  • The service must accept a large corpus of documents, >= 10B tokens in total

  • By full-parameter fine-tuning, I’m excluding methods like LoRA/QLoRA which do low rank updates. Similarly I’m excluding memory efficient optimizers (e.g. AdaFactor). The service should use AdamW or a similar optimizer without excessive quantization; all optimizer states, gradients, parameters, buffers etc. should be in at least 8 bit precision.

  • It should be generally available, e.g. like the OpenAI fine-tuning API. You shouldn’t have to consult with people for your specific fine-tuning job before using it.

If you think such a service exists, please post in the comments below. I’ll make a reasonable effort to confirm that it meets the requirements and resolve if it does.

Get Ṁ1,000 play money
Sort by:
bought Ṁ30 YES

This is a great question

Possibly, looking into it now

Hm, possibly they aren't offering fine thing quite yet, or at least not outside of a preview? Other pages mention only 3.1 70b and smaller and they don't give pricing for fine-tuning that model as distinct from inference.

I think that’s right. According to this source, fine tuning isn’t available yet:

“Can I fine-tune the Llama 3.1 405B model? What about other models? 

Not yet for 405B Instruct – stay tuned!  

 

Models available to fine-tune today: 

  • Deployment as serverless API (MaaS): 8B Instruct and 70B Instruct. 

  • Deployment as managed compute: 8B Instruct, 70B Instruct, 8B, 70B.”

But if they do offer a full fine-tuning service sometime in the future, and you could download the resulting model, that would be sufficient to resolve YES.

Does what they offer for smaller models now qualify as full fine-tuning?

Good question, I don’t think so. From the Azure documentation, “We use LoRA, or low rank approximation, to fine-tune models in a way that reduces their complexity without significantly affecting their performance.”

https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233

This indicates that at least for llama 2 you could turn off the Lora for fine-tuning (the animation of them going through the process included a Lora checkbox in the settings), though I suppose that might not be the case for openai models. Also if they provide Lora but allow you to choose the parameters for it you could maybe choose a rank equal to the actual model dimension which would make it the same as no-lora fine tuning iiiuc.

That’s a good point, didn’t notice that. The llama 2 example certainly counts as full fine-tuning since you can disable LoRA. I’m inclined to say that full-rank LoRA counts as well after rereading the LoRA paper.

@mr_mino I think databricks now offers both continued pre-training of Llama 405b (which I think is the same as full parameter finetuning?), with an option to do it on serverless compute. However the documentation is a bit confusing and it's not being offered as pay-per-token yet, just pay per "DBU" which I gather is related to how much compute you use.

@Fay42 I think there’s also a data requirement which would disqualify this one:

  • For continuous pre-training, workloads are limited to 60-256MB files.

  • Large datasets (10B+ tokens) are not supported due to compute availability

Contra the requirement:

  • The service must accept a large corpus of documents, >= 10B tokens in total

@mr_mino whoops, didn't notice that limit - sorry

bought Ṁ50 NO

Do you think this make sense? You can use LoRA technique and get 90% of results at 3% of cost.

There are some domains (e.g. healthcare, finance, automating various high-paying jobs) where the extra 10% of performance/reliability is worth the cost. But I’d be interested in an unconditional market as well.

what do you mean, "without renting gpus"? You don't mean free, do you?

I guess it means you pay them a one-time fee and they use GPUs that they own/rented themselves

I mean that you shouldn’t have to rent or buy hardware in order to use the fine-tuning services; it does so on your behalf. E.g. like the OpenAI/Mistral API. I expect it to cost money.