Phase 07: Transformers Deep Dive
AI From Scratch/Lesson 09/~45 minutes

Vision Transformers (ViT)

An image is a grid of patches. A sentence is a grid of tokens. The same transformer eats both.

BuildPython
Loading lesson page...