How it works
Step 01
Pick a goal.
Step 02
Choose tools and limits.
Step 03
Score the agent on outcome, safety, and cost.
Sandbox / Hard / 20 min
Design an agent and watch it tackle live tasks against constraints.
Step 01
Pick a goal.
Step 02
Choose tools and limits.
Step 03
Score the agent on outcome, safety, and cost.