Loading lesson page...
AI From Scratch/Lesson 58/~90 minutes
Vision Encoder Patches
A vision model that reads pixels needs a tokenizer for pixels. Patch embedding is that tokenizer. Cut the image into a grid of squares, flatten each square, project it through one linear layer, then add a 2D position signal so the transfor...
BuildPythonNo prerequisites