Phase 12: Multimodal AI
AI From Scratch/Lesson 14/~120 minutes

Show-o and Discrete-Diffusion Unified Models

Transfusion mixes continuous and discrete representations. Show-o (Xie et al., August 2024) goes the other way: text tokens use causal next-token prediction, image tokens use masked discrete diffusion in the spirit of MaskGIT. Both sit ins...

Learn
Loading lesson page...