Loading lesson page...
AI From Scratch/Lesson 04/30 hours
Capstone 04 — Multimodal Document QA (Vision-First PDF, Tables, Charts)
The 2026 document-QA frontier moved away from OCR-then-text and toward vision-first late interaction. ColPali, ColQwen2.5, and ColQwen3-omni treat each PDF page as an image, embed it with multi-vector late interaction, and let the query at...
CapstonePython (pipeline)TypeScript (viewer UI)