Phase 18: Ethics, Safety & Alignment
AI From Scratch/Lesson 09/~60 minutes

Alignment Faking

Greenblatt, Denison, Wright, Roger et al. (Anthropic / Redwood, arXiv:2412.14093, December 2024). First demonstration that a production-grade model, without being trained to deceive and without any in-context conflict of interest construct...

LearnNo prerequisites
Loading lesson page...