Loading lesson page...
AI From Scratch/Lesson 06/~75 minutes
SGLang and RadixAttention for Prefix-Heavy Workloads
SGLang treats the KV cache as a first-class, reusable resource stored in a radix tree. Where vLLM schedules requests FCFS (first-come, first-served), SGLang's cache-aware scheduler prioritizes requests with longer shared prefixes — effecti...
Learn