Loading lesson page...
AI From Scratch/Lesson 17/~60 minutes
Native Sparse Attention (DeepSeek NSA)
At 64k tokens, attention eats 70-80% of decode latency. Every open-model lab has a plan to fix it. DeepSeek's NSA (ACL 2025 best paper) is the one that stuck: three parallel attention branches — compressed coarse-grained tokens, selectivel...
BuildPython (stdlib)No prerequisites