Phase 10: LLMs from Scratch
AI From Scratch/Lesson 17/~60 minutes

Native Sparse Attention (DeepSeek NSA)

At 64k tokens, attention eats 70-80% of decode latency. Every open-model lab has a plan to fix it. DeepSeek's NSA (ACL 2025 best paper) is the one that stuck: three parallel attention branches — compressed coarse-grained tokens, selectivel...

BuildPython (stdlib)No prerequisites
Loading lesson page...