Deliberate Machine Learning

Share this post

[Interesting content] InstructGPT, RLHF and SFT

laszlo.substack.com

[Interesting content] InstructGPT, RLHF and SFT

2023-01-24

Laszlo Sragner
Jan 24
1
Share this post

[Interesting content] InstructGPT, RLHF and SFT

laszlo.substack.com

Arize invited Long Ouyang and Ryan Lowe to their podcast to talk about InstructGPT, the model ChatGPT is based on, and the whole content is 🔥.

Key takeaways:

  • The concept of alignment (a term popularised by Stuart Russell from Berkeley, I link an interview with him in the comments).

  • InstructGPT is based on GPT-3, but it is aware that it is getting instructions while the older model was only "tricked" into performing them.

  • What is a Reward Model, and how is it incorporated into the training process

  • The difference between RLHF (reinforcement learning from human feedback) and SFT (Supervised fine-tuning): RLHF is for fine-grain tuning, while SFT causes more significant shift in model behaviour.

  • Regarding prompts, the major improvement is that older models needed to be prompted in a specific "almost coded" language, and now they can be prompted more intuitively. They are less sensitive to the prompts but still steerable and, most importantly, "naturally" steerable.

    (Full disclosure: I am an advisor of Arize AI.)

Further material

  • InstructGPT paper on Arxiv: https://arxiv.org/abs/2203.02155

  • GPT-3 paper on Arxiv: https://arxiv.org/abs/2005.14165

  • Interview with Stuart Russell (on alignment): [Youtube]

One interesting takeaway from the paper is the training cost of fine-tuning:

Due to the huge interest in ChatGPT, I plan to post regularly about it, so:

Or follow me on LinkedIn:

Follow me on LinkedIn

Share this post

[Interesting content] InstructGPT, RLHF and SFT

laszlo.substack.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Laszlo Sragner
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing