Phase 12: Multimodal AI
AI From Scratch/Lesson 04/~120 minutes

Flamingo and Gated Cross-Attention for Few-Shot VLMs

DeepMind's Flamingo (2022) did two things before anyone else. It showed a single model could process arbitrarily interleaved sequences of images, videos, and text. And it showed VLMs could learn in-context — give a few-shot prompt with thr...

LearnNo prerequisites
Loading lesson page...