[Interesting content] InstructGPT, RLHF and SFT
Arize invited Long Ouyang and Ryan Lowe to their podcast to talk about InstructGPT, the model ChatGPT is based on, and the whole content is 🔥.
The concept of alignment (a term popularised by Stuart Russell from Berkeley, I link an interview with him in the comments).
InstructGPT is based on GPT-3, but it is aware that it is getting instructions while the older model was only "tricked" into performing them.
What is a Reward Model, and how is it incorporated into the training process
The difference between RLHF (reinforcement learning from human feedback) and SFT (Supervised fine-tuning): RLHF is for fine-grain tuning, while SFT causes more significant shift in model behaviour.
Regarding prompts, the major improvement is that older models needed to be prompted in a specific "almost coded" language, and now they can be prompted more intuitively. They are less sensitive to the prompts but still steerable and, most importantly, "naturally" steerable.
(Full disclosure: I am an advisor of Arize AI.)
InstructGPT paper on Arxiv: https://arxiv.org/abs/2203.02155
GPT-3 paper on Arxiv: https://arxiv.org/abs/2005.14165
Interview with Stuart Russell (on alignment): [Youtube]
One interesting takeaway from the paper is the training cost of fine-tuning:
Due to the huge interest in ChatGPT, I plan to post regularly about it, so:
Or follow me on LinkedIn: