Loading lesson page...
AI From Scratch/Lesson 05/~75 minutes
Whisper — Architecture & Fine-Tuning
Whisper is a 30-second-window transformer encoder-decoder, trained on 680k hours of multilingual weakly-supervised audio-text pairs. One architecture, multiple tasks, robust across 99 languages. The 2026 reference ASR.
BuildPython