Phase 19: Capstone Projects
AI From Scratch/Lesson 44/~90 minutes

Cosine LR with Linear Warmup

The learning-rate schedule is the second most important decision after the loss function. AdamW with a cosine decay and a linear warmup is the modern default for language-model training because it lets the model see a small effective step...

BuildPythonNo prerequisites
Loading lesson page...