Loading lesson page...
AI From Scratch/Lesson 08/~60 minutes
Inference Metrics — TTFT, TPOT, ITL, Goodput, P99
Four metrics decide whether an inference deployment is working. TTFT is prefill plus queue plus network. TPOT (equivalently ITL) is the memory-bound decode cost per token. End-to-end latency is TTFT plus TPOT times output length. Throughpu...
LearnPython (stdlibtoy percentile calculator and goodput reporter)