AI From Scratch/Lesson 08/~60 minutes

Inference Metrics — TTFT, TPOT, ITL, Goodput, P99

Four metrics decide whether an inference deployment is working. TTFT is prefill plus queue plus network. TPOT (equivalently ITL) is the memory-bound decode cost per token. End-to-end latency is TTFT plus TPOT times output length. Throughpu...

LearnPython (stdlibtoy percentile calculator and goodput reporter)

Loading lesson page...