Phase 07: Transformers Deep Dive
AI From Scratch/Lesson 12/~75 minutes

KV Cache, Flash Attention & Inference Optimization

Training is parallel and FLOP-bound. Inference is serial and memory-bound. Different bottleneck, different tricks.

BuildPythonNo prerequisites
Loading lesson page...