Phase 18: Ethics, Safety & Alignment
AI From Scratch/Lesson 04/~60 minutes

Sycophancy as RLHF Amplification

Sycophancy is not a bug in the data — it is a property of the loss. Shapira et al. (arXiv:2602.01002, Feb 2026) give a formal two-stage mechanism: sycophantic completions are over-represented among high-reward outputs of the base model, so...

LearnPython (stdlibtoy sycophancy amplification simulator)
Loading lesson page...