AI From Scratch/Lesson 04/~75 minutes

Temporal Difference — Q-Learning & SARSA

Monte Carlo waits until the episode ends. TD updates after every step by bootstrapping the next value estimate. Q-learning is off-policy and optimistic; SARSA is on-policy and cautious. Both are one line of code. Both underpin every deep-R...

BuildPython

Loading lesson page...