Phase 19: Capstone Projects
AI From Scratch/Lesson 78/~90 min

ZeRO Optimizer State Sharding

Adam stores two moment estimates per parameter, both in float32. A 7B-parameter model carries 56 GB of optimiser state. ZeRO stage 1 shards that across N ranks; each rank owns 1/N of the optimiser. After the local step the updated paramete...

BuildPythonNo prerequisites
Loading lesson page...