AI From Scratch/Lesson 14/30 hours

Capstone 14 — Speculative-Decoding Inference Server

EAGLE-3 in vLLM 0.7 ships 2.5-3x throughput on real traffic. P-EAGLE (AWS 2026) pushed parallel speculation even further. SGLang's SpecForge trained draft heads at scale. Red Hat's Speculators hub published aligned drafts for common open m...

CapstonePython (serving)C++CUDA (kernel inspection)YAML (configs)

Loading lesson page...