Loading lesson page...
AI From Scratch/Lesson 46/~90 minutes
Gradient Accumulation
Train at an effective batch you cannot afford, one micro-batch at a time. Scale the loss, hold the optimizer step, and let the gradients pile up.
BuildPython
Train at an effective batch you cannot afford, one micro-batch at a time. Scale the loss, hold the optimizer step, and let the gradients pile up.