Loading lesson page...
AI From Scratch/Lesson 01/~45 minutes
MDPs, States, Actions & Rewards
A Markov Decision Process is five things: states, actions, transitions, rewards, a discount. Everything in RL — Q-learning, PPO, DPO, GRPO — optimizes over this shape. Learn it once, read the rest of reinforcement learning for free.
LearnPython