Loading lesson page...
AI From Scratch/Lesson 15/~60 minutes
Attention Variants — Sliding Window, Sparse, Differential
Full attention is a circle. Every token sees every token, and memory pays the price. Four variants bend the shape of the circle and recover half the cost.
BuildPythonNo prerequisites