Phase 14: Agent Engineering
AI From Scratch/Lesson 20/~60 minutes

Benchmarks: WebArena and OSWorld

WebArena tests web-agent capability across four self-hosted apps. OSWorld tests desktop-agent capability across Ubuntu, Windows, macOS. At release (2023–2024) both showed a big gap between best-in-class agents and humans. The gap is narrow...

LearnPython (stdlib)No prerequisites
Loading lesson page...