Phase 19: Capstone Projects
AI From Scratch/Lesson 04/30 hours

Capstone 04 — Multimodal Document QA (Vision-First PDF, Tables, Charts)

The 2026 document-QA frontier moved away from OCR-then-text and toward vision-first late interaction. ColPali, ColQwen2.5, and ColQwen3-omni treat each PDF page as an image, embed it with multi-vector late interaction, and let the query at...

CapstonePython (pipeline)TypeScript (viewer UI)
Loading lesson page...