AI From Scratch/Lesson 45/~90 minutes

Gradient Clipping and Mixed Precision

The optimizer and schedule from the previous lesson assume gradients are sane. They usually are not. A single bad batch can spike the gradient norm by three orders of magnitude. Mixed-precision training amplifies this by introducing FP16 o...

BuildPythonNo prerequisites

Loading lesson page...