Loading lesson page...
AI From Scratch/Lesson 14/30 hours
Capstone 14 — Speculative-Decoding Inference Server
EAGLE-3 in vLLM 0.7 ships 2.5-3x throughput on real traffic. P-EAGLE (AWS 2026) pushed parallel speculation even further. SGLang's SpecForge trained draft heads at scale. Red Hat's Speculators hub published aligned drafts for common open m...
CapstonePython (serving)C++CUDA (kernel inspection)YAML (configs)