Loading lesson page...
AI From Scratch/Lesson 12/~75 minutes
KV Cache, Flash Attention & Inference Optimization
Training is parallel and FLOP-bound. Inference is serial and memory-bound. Different bottleneck, different tricks.
BuildPythonNo prerequisites