Loading lesson page...
AI From Scratch/Lesson 10/~75 minutes
AI Control — Safety Despite Subversion
Greenblatt, Shlegeris, Sachan, Roger (Redwood Research, arXiv:2312.06942, ICML 2024). Control reframes the safety question: given an untrusted strong model U that may be adversarially optimizing against you, what protocols let you extract...
Learn