Phase 18: Ethics, Safety & Alignment
AI From Scratch/Lesson 01/~45 minutes

Instruction-Following as Alignment Signal

Every later critique of RLHF argues against this pipeline. Before you study how optimization pressure distorts a proxy, you have to see the proxy. InstructGPT (Ouyang et al., 2022) defined the reference architecture: supervised fine-tuning...

Learn
Loading lesson page...