Phase 18: Ethics, Safety & Alignment
AI From Scratch/Lesson 11/~60 minutes

Scalable Oversight and Weak-to-Strong Generalization

Burns et al. (OpenAI Superalignment, "Weak-to-Strong Generalization", 2023) proposed a proxy for the superalignment problem: fine-tune a strong model using labels produced by a weaker model. If the strong model generalizes correctly from i...

LearnPython (stdlibW2SG gap simulator)No prerequisites
Loading lesson page...