AI From Scratch

Phase 15/22 lessons/~20 hours

Autonomous Systems

Agents that run without human intervention — safely.

0 / 22 complete0%

Lessons

01The Shift from Chatbots to Long-Horizon AgentsUp nextIn 2023 a chatbot answered a question in one turn. In 2026 a frontier model routinely runs minutes to hours on a single task. METR's Time Horizon 1.1 benchmark (January 2026) puts Claude Opus 4.6 at 14+ hours of expert work at 50% reliabil...Learn/~45 minutes 02STaR, V-STaR, Quiet-STaR — Self-Taught ReasoningThe smallest possible self-improvement loop sits inside the rationale. A model generates a chain of thought, keeps the ones that land on correct answers, and fine-tunes on those. That is STaR. V-STaR adds a verifier so inference-time selec...Learn/~60 minutes 03AlphaEvolve — Evolutionary Coding AgentsPair a frontier coding model with an evolutionary loop and a machine-checkable evaluator. Let the loop run long enough. It discovers a 4x4 complex-matrix multiplication procedure that uses 48 scalar multiplications — the first improvement...Learn/~60 minutes 04Darwin Godel Machine — Open-Ended Self-Modifying AgentsSchmidhuber's 2003 Godel Machine required a formal proof that any self-modification was beneficial before accepting it. That proof is impossible in practice. Darwin Godel Machine (Zhang et al., 2025) drops the proof and keeps the archive:...Learn/~60 minutes 05AI Scientist v2 — Workshop-Level Autonomous ResearchSakana's AI Scientist v2 (Yamada et al., arXiv:2504.08066) runs the full research loop: hypothesis, code, experiments, figures, writeup, submission. It is the first system to have a generated paper pass peer review at an ICLR 2025 workshop...Learn/~60 minutes 06Automated Alignment Research (Anthropic AAR)Anthropic ran parallel teams of Claude Opus 4.6 Autonomous Alignment Researchers in independent sandboxes, coordinating via a shared forum whose logs live outside any sandbox (so agents cannot delete their own records). On the weak-to-stro...Learn/~60 minutes 07Recursive Self-Improvement — Capability vs AlignmentRecursive self-improvement (RSI) is no longer speculation. The ICLR 2026 RSI Workshop in Rio (April 23-27) framed it as an engineering problem with concrete tooling. Demis Hassabis at WEF 2026 asked publicly whether the loop can close with...Learn/~60 minutes 08Bounded Self-Improvement DesignsResearch has converged on four primitives for bounding a self-improvement loop. Formal invariants that must hold across every edit. Alignment anchors that cannot be modified. Multi-objective constraints where every dimension (safety, fairn...Learn/~60 minutes 09The Autonomous Coding Agent Landscape (2026)SWE-bench Verified went from 4% to 80.9% in under three years. Same Claude Sonnet 4.5 scored 43.2% on SWE-agent v1 and 59.8% on Cline autonomous — the scaffolding around the model now matters as much as the model itself. OpenHands (formerl...Learn/~45 minutes 10Claude Code as an Autonomous Agent: Permission Modes and Auto ModeClaude Code exposes seven permission modes. "plan" asks before every action, "default" asks only for risky ones, "acceptEdits" auto-approves file writes but still confirms shell execution, and "bypassPermissions" approves everything. Auto...Learn/~45 minutes 11Browser Agents and Long-Horizon Web TasksChatGPT agent (July 2025) merged Operator and deep research into one browser/terminal agent and set BrowseComp SOTA at 68.9%. OpenAI shut Operator down August 31, 2025 — consolidation at the product layer. Anthropic's Vercept acquisition m...Learn/~45 minutes 12Long-Running Background Agents: Durable ExecutionProduction long-horizon agents do not run in while True. Every LLM call becomes an activity with checkpoint, retry, and replay. Temporal's OpenAI Agents SDK integration went GA March 2026. Claude Code Routines (Anthropic) runs scheduled Cl...Learn/~60 minutes 13Action Budgets, Iteration Caps, and Cost GovernorsA mid-sized e-commerce agent's monthly LLM cost jumped from $1,200 to $4,800 after its team enabled the "order-tracking" skill. That is not a pricing bug. That is an agent that found a new loop and kept spending inside it. Microsoft's Agen...Learn/~60 minutes 14Kill Switches, Circuit Breakers, and Canary TokensA kill switch is a boolean held outside the agent's edit surface — a Redis key, a feature flag, a signed config — that disables the agent entirely. A circuit breaker is finer-grained: it trips on a specific pattern (five identical tool cal...Learn/~60 minutes 15Human-in-the-Loop: Propose-Then-CommitThe 2026 consensus on HITL is specific. It is not "the agent asks, the user clicks Approve." It is propose-then-commit: the proposed action is persisted to a durable store with an idempotency key; surfaced to a reviewer with intent, data l...Learn/~60 minutes 16Checkpoints and RollbackEvery graph-state transition persists. When a worker crashes, its lease expires and another worker picks up at the latest checkpoint. Cloudflare Durable Objects hold state across hours or weeks. Propose-then-commit (Lesson 15) defines a ro...Learn/~60 minutes/Python (stdlib, checkpoint and rollback state machine)17Constitutional AI and Rule OverridesAnthropic's January 22, 2026 Claude Constitution runs 79 pages and is CC0. It moves from rule-based to reason-based alignment and establishes a four-tier priority hierarchy: (1) safety and supporting human oversight, (2) ethics, (3) Anthro...Learn/~60 minutes 18Llama Guard and Input/Output ClassificationLlama Guard 3 (Meta, Llama-3.1-8B base, fine-tuned for content safety) classifies both LLM inputs and outputs against an MLCommons 13-hazard taxonomy across 8 languages. A 1B-INT4 quantized variant runs at over 30 tokens/sec on mobile CPUs...Learn/~45 minutes 19Anthropic Responsible Scaling Policy v3.0RSP v3.0 went into effect February 24, 2026, replacing the 2023 policy. Two-tier mitigation: what Anthropic will do unilaterally vs what is framed as an industry-wide recommendation (including RAND SL-4 security standards). Adds Frontier S...Learn/~45 minutes/Python (stdlib, RSP threshold decision engine)20OpenAI Preparedness Framework and DeepMind Frontier Safety FrameworkOpenAI Preparedness Framework v2 (April 2025) introduces Research Categories — Long-range Autonomy, Sandbagging, Autonomous Replication and Adaptation, Undermining Safeguards — distinct from Tracked Categories. Tracked Categories trigger C...Learn/~45 minutes 21METR Time Horizons and External Capability EvaluationMETR (ex-ARC Evals) is an independent 501(c)(3) since December 2023. Their Time Horizon 1.1 benchmark (January 2026) fits a logistic curve to task-success probability vs log(expert human completion time); the intersection at 50% probabilit...Learn/~60 minutes 22CAIS, CAISI, and Societal-Scale RiskThe Center for AI Safety (CAIS, San Francisco, founded 2022 by Hendrycks and Zhang) publishes the four-risk framework — malicious use, AI races, organizational risks, rogue AIs — and the May 2023 statement on extinction risk signed by hund...Learn/~45 minutes