Loading lesson page...
AI From Scratch/Lesson 16/~120 minutes
MIO and Any-to-Any Streaming Multimodal Models
GPT-4o ships a product most open models cannot replicate: an agent that hears voice, sees video, and speaks back in real time. The open-ecosystem answer by late 2024 was MIO (Wang et al., September 2024). MIO tokenizes text, image, speech,...
Learn