Phase 12: Multimodal AI
AI From Scratch/Lesson 13/~180 minutes

Transfusion: Autoregressive Text + Diffusion Image in One Transformer

Chameleon and Emu3 bet everything on discrete tokens. They work, but the quantization bottleneck is visible — the image quality plateaus below continuous-space diffusion models. Transfusion (Meta, Zhou et al., August 2024) takes the opposi...

Build
Loading lesson page...