Loading lesson page...
AI From Scratch/Lesson 08/~180 minutes
LLaVA-OneVision: Single-Image, Multi-Image, Video in One Model
Before LLaVA-OneVision (Li et al., August 2024) the open-VLM world had separate lineages: LLaVA-1.5 for single images, multi-image models like Mantis and VILA, video models like Video-LLaVA and Video-LLaMA. Each won its benchmark and faile...
BuildPython (stdlibtoken budget solver + curriculum planner)No prerequisites