Loading lesson page...
AI From Scratch/Lesson 17/~75 minutes
Disaggregated Prefill/Decode — NVIDIA Dynamo and llm-d
Prefill is compute-bound; decode is memory-bound. Running both on the same GPU wastes one resource. Disaggregation splits them onto separate pools and transfers KV cache between them over NIXL (RDMA/InfiniBand or TCP fallback). NVIDIA Dyna...
Learn