Loading lesson page...
AI From Scratch/Lesson 25/~240 minutes
Multimodal Agents and Computer-Use (Capstone)
The 2026 frontier product is a multimodal agent that reads screenshots, clicks buttons, navigates web UIs, fills forms, and completes workflows end-to-end. SeeClick and CogAgent (2024) proved the GUI-grounding primitive. Ferret-UI added mo...
CapstonePython (stdlibaction schema + agent loop skeleton)No prerequisites