Loading lesson page...
AI From Scratch/Lesson 09/~60 minutes
Alignment Faking
Greenblatt, Denison, Wright, Roger et al. (Anthropic / Redwood, arXiv:2412.14093, December 2024). First demonstration that a production-grade model, without being trained to deceive and without any in-context conflict of interest construct...
LearnNo prerequisites