AI From Scratch/Lesson 46/~90 minutes

Gradient Accumulation

Train at an effective batch you cannot afford, one micro-batch at a time. Scale the loss, hold the optimizer step, and let the gradients pile up.

BuildPython

Loading lesson page...