Loading lesson page...
AI From Scratch/Lesson 15/~75 minutes
Streaming Speech-to-Speech — Moshi, Hibiki, and Full-Duplex Dialogue
2024-2026 redefined voice AI. Moshi ships a single model that listens and speaks simultaneously at 200 ms latency. Hibiki does speech-to-speech translation chunk-by-chunk. Both abandon the ASR → LLM → TTS pipeline for a unified full-duplex...
LearnPythonNo prerequisites