Loading lesson page...
AI From Scratch/Lesson 16/~60 minutes
Speculative Decoding — Draft, Verify, Repeat
Autoregressive decoding is serial. Each token waits for the previous one. Speculative decoding breaks the chain: a cheap model drafts N tokens, the expensive model verifies all N in one forward pass. When the draft is right you paid one bi...
BuildPython