AI From Scratch/Lesson 02/~60 minutes

Reward Hacking and Goodhart's Law

Any optimizer strong enough to maximize a proxy reward will find the gap between the proxy and the thing you actually wanted. Gao et al. (ICML 2023) gave this a scaling law: proxy reward increases, gold reward peaks then falls, and the gap...

Learn

Loading lesson page...