Phase 07: Transformers Deep Dive
AI From Scratch/Lesson 03/~75 minutes

Multi-Head Attention

One attention head learns one relation at a time. Eight heads learn eight. Heads are free. Take more of them.

BuildPythonNo prerequisites
Loading lesson page...