Loading lesson page...
AI From Scratch/Lesson 45/~90 minutes
Gradient Clipping and Mixed Precision
The optimizer and schedule from the previous lesson assume gradients are sane. They usually are not. A single bad batch can spike the gradient norm by three orders of magnitude. Mixed-precision training amplifies this by introducing FP16 o...
BuildPythonNo prerequisites