Loading lesson page...
AI From Scratch/Lesson 12/~120 minutes
Inference Optimization
Two phases define LLM inference. Prefill processes your prompt in parallel -- compute-bound. Decode generates tokens one at a time -- memory-bound. Every optimization targets one or both.
BuildPythonNo prerequisites