AI From Scratch/Lesson 16/~120 minutes

MIO and Any-to-Any Streaming Multimodal Models

GPT-4o ships a product most open models cannot replicate: an agent that hears voice, sees video, and speaks back in real time. The open-ecosystem answer by late 2024 was MIO (Wang et al., September 2024). MIO tokenizes text, image, speech,...

Learn

Loading lesson page...