Phase 04: Computer Vision
AI From Scratch/Lesson 18/~45 minutes

Open-Vocabulary Vision — CLIP

Train an image encoder and a text encoder together so that matching (image, caption) pairs land at the same point in a shared space. That is the whole trick.

Build + UsePythonNo prerequisites
Loading lesson page...