Phase 12: Multimodal AI
AI From Scratch/Lesson 15/~120 minutes

Janus-Pro: Decoupled Encoders for Unified Multimodal Models

Unified multimodal models have an unavoidable tension. Understanding wants semantic features — SigLIP or DINOv2 output vectors rich with concept-level information. Generation wants reconstruction-friendly codes — VQ tokens that compose bac...

BuildNo prerequisites
Loading lesson page...