Phase 18: Ethics, Safety & Alignment
AI From Scratch/Lesson 13/~45 minutes

Many-Shot Jailbreaking

Anil, Durmus, Panickssery, Sharma, et al. (Anthropic, NeurIPS 2024). Many-shot jailbreaking (MSJ) exploits long context windows: stuff hundreds of faux user-assistant turns where the assistant complies with harmful requests, then append th...

LearnNo prerequisites
Loading lesson page...