Loading lesson page...
AI From Scratch/Lesson 08/~75 minutes
Proximal Policy Optimization (PPO)
A2C throws away each rollout after one update. PPO wraps the policy gradient in a clipped importance ratio so you can do 10+ epochs on the same data without the policy exploding. Schulman et al. (2017). Still the default policy-gradient al...
BuildPythonNo prerequisites