Phase 09: Reinforcement Learning
AI From Scratch/Lesson 04/~75 minutes

Temporal Difference — Q-Learning & SARSA

Monte Carlo waits until the episode ends. TD updates after every step by bootstrapping the next value estimate. Q-learning is off-policy and optimistic; SARSA is on-policy and cautious. Both are one line of code. Both underpin every deep-R...

BuildPython
Loading lesson page...