Phase 09: Reinforcement Learning
AI From Scratch/Lesson 12/~120 minutes

RL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era

1992: TD-Gammon beat human champions at backgammon with pure TD. 2016: AlphaGo beat Lee Sedol. 2017: AlphaZero dominated chess, shogi, and Go from scratch. 2024: DeepSeek-R1 proved the same recipe, with GRPO replacing PPO, works on reasoni...

BuildPython
Loading lesson page...