Phase 17: Infrastructure & Production
AI From Scratch/Lesson 08/~60 minutes

Inference Metrics — TTFT, TPOT, ITL, Goodput, P99

Four metrics decide whether an inference deployment is working. TTFT is prefill plus queue plus network. TPOT (equivalently ITL) is the memory-bound decode cost per token. End-to-end latency is TTFT plus TPOT times output length. Throughpu...

LearnPython (stdlibtoy percentile calculator and goodput reporter)
Loading lesson page...