Phase 12: Multimodal AI
AI From Scratch/Lesson 20/~180 minutes

Omni Models: Qwen2.5-Omni and the Thinker-Talker Split

GPT-4o's product demo in May 2024 was disruptive not because of the underlying model but because of the product shape — a voice interface where you talk, the model sees what the camera sees, and it talks back in under 250ms. The open ecosy...

BuildPython (stdlibstreaming pipeline latency simulator + VAD loop)No prerequisites
Loading lesson page...