I have a character Pride and also some precedessor prompts for making ChatGPT be a little rebellious.
I will be trying to train a LoRA of this character on top of Llama 2, which is similar to finetuning the model. (It adds new weights in-between layers but doesn't touch the old ones).
This market will resolve YES, NO, or PROB depending on my subjective assessment on how successful I was. I will be giving GPT-4 this market description, a "character overview" of Pride, and some example conversations, and will not resolve YES unless GPT-4 also approves.
My approach will look like:
Having GPT-4 generate hundreds or thousands of synthetic conversations using a character template and some example conversations from my character.ai .
Train a "stage 1 model" that tries but probably does a bad job.
Have GPT-4 generate hundreds or thousands of interactions with the stage 1 model, and rewrites the conversations to be "more like Pride".
Train a "stage 2 model" on these newer conversations.
Repeat as long as it's noticeably getting better.
Things that would cause this to resolve NO include:
OpenAI API costs being more expensive than I expect, so I choose to stop.
My model outputting complete nonsense after training.
My model not sounding noticeably different from base llama.
GPT-4 not approving of my example conversations as matching the character.
Me not approving of sampled conversations with the character.
I am allowed to purchase YES shares and cancel YES limit orders, but cannot purchase NO shares or sell YES shares.
I plan to train and generate the conversations on my M2 Ultra Mac Studio, although this hardware and OS aren't requirements. I have never trained a language model, beyond reimplementing nanoGPT and running it on a 1MB test dataset once. I want to train a 70b model, but if that would take weeks any model with a lesser parameter count is acceptable. The stage 1 model would probably use the smallest 7B parameter model anyways.
I will probably upload my models to HuggingFace when I'm done, although this isn't required. You can follow me here.
References:
LoRA paper: https://arxiv.org/abs/2106.09685
PEFT: https://github.com/huggingface/peft
"Hardware: Single A100 80GB GPU with CPU RAM above 64GB" - My Mac Studio has 192GB of RAM, 75% of which(144GB) should be usable as VRAM.
Pygmalion training scripts: https://github.com/PygmalionAI/training-code
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ128 | |
2 | Ṁ8 | |
3 | Ṁ6 | |
4 | Ṁ4 | |
5 | Ṁ1 |