AI From Scratch/Lesson 12/~75 minutes

KV Cache, Flash Attention & Inference Optimization

Training is parallel and FLOP-bound. Inference is serial and memory-bound. Different bottleneck, different tricks.

BuildPythonNo prerequisites

Loading lesson page...