Phase 06: Speech & Audio
AI From Scratch/Lesson 05/~75 minutes

Whisper — Architecture & Fine-Tuning

Whisper is a 30-second-window transformer encoder-decoder, trained on 680k hours of multilingual weakly-supervised audio-text pairs. One architecture, multiple tasks, robust across 99 languages. The 2026 reference ASR.

BuildPython
Loading lesson page...