Loading lesson page...
AI From Scratch/Lesson 02/~60 minutes
Reward Hacking and Goodhart's Law
Any optimizer strong enough to maximize a proxy reward will find the gap between the proxy and the thing you actually wanted. Gao et al. (ICML 2023) gave this a scaling law: proxy reward increases, gold reward peaks then falls, and the gap...
Learn