Phase 10: LLMs from Scratch
AI From Scratch/Lesson 12/~120 minutes

Inference Optimization

Two phases define LLM inference. Prefill processes your prompt in parallel -- compute-bound. Decode generates tokens one at a time -- memory-bound. Every optimization targets one or both.

BuildPythonNo prerequisites
Loading lesson page...