Loading lesson page...
AI From Scratch/Lesson 02/~180 minutes
CLIP and Contrastive Vision-Language Pretraining
OpenAI's CLIP (2021) proved a single idea big enough to power the next five years: align an image encoder and a text encoder in the same vector space using only noisy web image-caption pairs and a contrastive loss. Zero supervised labels....
BuildPython (stdlibInfoNCE + sigmoid loss implementations)