Loading lesson page...
AI From Scratch/Lesson 09/~45 minutes
Reward Modeling & RLHF
Humans cannot write a reward function for "good assistant response," but they can compare two responses and pick the better one. Fit a reward model to those comparisons, then RL the language model against it. Christiano 2017. InstructGPT 2...
BuildPython