Loading lesson page...
AI From Scratch/Lesson 11/~60 minutes
Multi-Region LLM Serving and KV Cache Locality
Round-robin load balancing is actively harmful for cached LLM inference. A request that does not land on the node holding its prefix pays full prefill cost — roughly 800 ms at P50 on a long prompt versus ~80 ms with a cache hit. In 2026 th...
Learn