Loading lesson page...
AI From Scratch/Lesson 44/~90 minutes
Cosine LR with Linear Warmup
The learning-rate schedule is the second most important decision after the loss function. AdamW with a cosine decay and a linear warmup is the modern default for language-model training because it lets the model see a small effective step...
BuildPythonNo prerequisites