Phase 12: Multimodal AI
AI From Scratch/Lesson 19/~180 minutes

Audio-Language Models: the Whisper to Audio Flamingo 3 Arc

Whisper (Radford et al., December 2022) settled speech recognition — 680k hours of weakly-supervised multilingual speech, a simple encoder-decoder transformer, a benchmark that made every subsequent ASR release cite it. But recognition is...

BuildNo prerequisites
Loading lesson page...