Loading lesson page...
AI From Scratch/Lesson 18/~45 minutes
Open-Vocabulary Vision — CLIP
Train an image encoder and a text encoder together so that matching (image, caption) pairs land at the same point in a shared space. That is the whole trick.
Build + UsePythonNo prerequisites