AI From Scratch/Lesson 10/~75 minutes

AI Control — Safety Despite Subversion

Greenblatt, Shlegeris, Sachan, Roger (Redwood Research, arXiv:2312.06942, ICML 2024). Control reframes the safety question: given an untrusted strong model U that may be adversarially optimizing against you, what protocols let you extract...

Learn

Loading lesson page...