Phase 07: Transformers Deep Dive

AI From Scratch/Lesson 09/~45 minutes

Vision Transformers (ViT)

An image is a grid of patches. A sentence is a grid of tokens. The same transformer eats both.

BuildPython

Loading lesson page...