AI From Scratch/Lesson 01/~45 minutes

MDPs, States, Actions & Rewards

A Markov Decision Process is five things: states, actions, transitions, rewards, a discount. Everything in RL — Q-learning, PPO, DPO, GRPO — optimizes over this shape. Learn it once, read the rest of reinforcement learning for free.

LearnPython

Loading lesson page...