Phase 07: Transformers Deep Dive
AI From Scratch/Lesson 15/~60 minutes

Attention Variants — Sliding Window, Sparse, Differential

Full attention is a circle. Every token sees every token, and memory pays the price. Four variants bend the shape of the circle and recover half the cost.

BuildPythonNo prerequisites
Loading lesson page...