AI From Scratch/Lesson 06/~75 minutes

SGLang and RadixAttention for Prefix-Heavy Workloads

SGLang treats the KV cache as a first-class, reusable resource stored in a radix tree. Where vLLM schedules requests FCFS (first-come, first-served), SGLang's cache-aware scheduler prioritizes requests with longer shared prefixes — effecti...

Learn

Loading lesson page...