
The obvious YES resolution is if some form of RLHF (https://arxiv.org/abs/1706.03741) is used, but others forms of RL would count.
The RL loop must actually directly affect the weights. If there's some RL for, say, architecture search or hyperparameter optimization as an outer loop that doesn't count.
Nov 25, 11:10pm: Will GPT-4 be at least partially trained with RL? → GPT-4 #2: Will GPT-4 be at least partially trained with RL?
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ145 | |
2 | Ṁ132 | |
3 | Ṁ23 | |
4 | Ṁ21 | |
5 | Ṁ20 |
People are also trading
Training with human feedback
We incorporated more human feedback, including feedback submitted by ChatGPT users, to improve GPT-4’s behavior. We also worked with over 50 experts for early feedback in domains including AI safety and security.
"GPT-4 will be multimodal":
https://www.heise.de/news/GPT-4-is-coming-next-week-and-it-will-be-multimodal-says-Microsoft-Germany-7540972.html
May not be reliable information, or might be poorly phrased, but makes this market a bit murky, so I've sold my shares.