Loading lesson page...
AI From Scratch/Lesson 77/~90 min
Data Parallel DDP From Scratch
DistributedDataParallel is a hook on top of allreduce. Wrap a model, broadcast the initial parameters from rank 0 so every rank starts identical, install a backward hook on every parameter that issues an allreduce of the gradient, and the...
BuildPythonNo prerequisites