Resolves YES if a service or API exists before 2026 which does the following:
You upload text document(s) for fine-tuning
The service does full parameter fine-tuning on Llama 3.1 405B, without you having to rent GPUs
You can download the fine-tuned model
Extra Details:
I’ll accept a service which finetunes a model other than Llama 3.1 405B, so long as it is more capable (as judged by e.g. the lmsys leaderboard) and has at least 400B parameters. E.g. if Meta releases Llama 4 405B, that counts
The service must accept a large corpus of documents, >= 10B tokens in total
By full-parameter fine-tuning, I’m excluding methods like LoRA/QLoRA which do low rank updates. Similarly I’m excluding memory efficient optimizers (e.g. AdaFactor). The service should use AdamW or a similar optimizer without excessive quantization; all optimizer states, gradients, parameters, buffers etc. should be in at least 8 bit precision.
It should be generally available, e.g. like the OpenAI fine-tuning API. You shouldn’t have to consult with people for your specific fine-tuning job before using it.
If you think such a service exists, please post in the comments below. I’ll make a reasonable effort to confirm that it meets the requirements and resolve if it does.
Hm, possibly they aren't offering fine thing quite yet, or at least not outside of a preview? Other pages mention only 3.1 70b and smaller and they don't give pricing for fine-tuning that model as distinct from inference.
I think that’s right. According to this source, fine tuning isn’t available yet:
“Can I fine-tune the Llama 3.1 405B model? What about other models?
Not yet for 405B Instruct – stay tuned!
Models available to fine-tune today:
Deployment as serverless API (MaaS): 8B Instruct and 70B Instruct.
Deployment as managed compute: 8B Instruct, 70B Instruct, 8B, 70B.”
But if they do offer a full fine-tuning service sometime in the future, and you could download the resulting model, that would be sufficient to resolve YES.
Good question, I don’t think so. From the Azure documentation, “We use LoRA, or low rank approximation, to fine-tune models in a way that reduces their complexity without significantly affecting their performance.”
This indicates that at least for llama 2 you could turn off the Lora for fine-tuning (the animation of them going through the process included a Lora checkbox in the settings), though I suppose that might not be the case for openai models. Also if they provide Lora but allow you to choose the parameters for it you could maybe choose a rank equal to the actual model dimension which would make it the same as no-lora fine tuning iiiuc.
@mr_mino I think databricks now offers both continued pre-training of Llama 405b (which I think is the same as full parameter finetuning?), with an option to do it on serverless compute. However the documentation is a bit confusing and it's not being offered as pay-per-token yet, just pay per "DBU" which I gather is related to how much compute you use.
@Fay42 I think there’s also a data requirement which would disqualify this one:
For continuous pre-training, workloads are limited to 60-256MB files.
Large datasets (10B+ tokens) are not supported due to compute availability
Contra the requirement:
The service must accept a large corpus of documents, >= 10B tokens in total
Do you think this make sense? You can use LoRA technique and get 90% of results at 3% of cost.
I mean that you shouldn’t have to rent or buy hardware in order to use the fine-tuning services; it does so on your behalf. E.g. like the OpenAI/Mistral API. I expect it to cost money.