Loading lesson page...
AI From Scratch/Lesson 07/~75 minutes
Text-to-Speech (TTS) — From Tacotron to F5 and Kokoro
ASR inverts speech to text; TTS inverts text to speech. The 2026 stack is three parts: text → tokens, tokens → mel, mel → waveform. Each part has a default model that fits in a laptop.
BuildPython