Phase 09: Reinforcement Learning
AI From Scratch/Lesson 07/~75 minutes

Actor-Critic — A2C and A3C

REINFORCE is noisy. Add a critic that learns V̂(s), subtract it from the return, and you get an advantage that has the same expectation but far lower variance. That is actor-critic. A2C runs it synchronously; A3C runs it across threads. Bo...

BuildPython
Loading lesson page...