Claude Agent SDK: The Gather, Act, Verify Loop

Source: Anthropic Engineering, Building Agents with the Claude Agent SDK (Anthropic, 2025)
Series: The 10 Agent Whitepapers Every Builder Should Read
TL;DR
Anthropic's single most important agent post of 2025 compresses everything they learned building Claude Code into one primitive: a three-phase loop, Gather then Act then Verify, repeated until the goal is met. The Claude Agent SDK is the runtime that makes that loop cheap to build on any task, not just coding. If you only internalize one framework from this list, make it this one. Everything else in the top 10 is a specialization of this pattern.
1. What it is
The Claude Agent SDK is Anthropic's open framework for building agents that use a computer the way a programmer does. Write files, run commands, read outputs, iterate. It's the exact scaffolding under Claude Code, extracted and generalized so you can point it at any domain: research agents, support agents, data-pipeline agents, DevOps agents.
The SDK ships three things:
| Piece | What it gives you |
|---|---|
| The loop runtime | A conversational turn engine that handles tool_use / tool_result exchanges, parallel calls, and context compaction automatically |
| A tool catalog | Bash, file edit/read, web fetch, computer use, plus MCP server plug-in points |
| Control surfaces | Permission modes, hooks, subagent spawn, session forking, cost/turn caps |
The canonical loop
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌─────────┐ ┌──────────┐ │
│ │ GATHER │──▶│ ACT │──▶│ VERIFY │ │
│ │ context │ │ tools │ │ output │ │
│ └──────────┘ └─────────┘ └──────────┘ │
│ ▲ │ │
│ └──────────── repeat ──────────┘ │
│ │
└─────────────────────────────────────────────────┘
Each phase is a design decision, not just a runtime step. The quality of your agent is almost entirely a function of how well-tuned each phase is for the task.
2. Why it matters
Before this SDK, "building an agent" usually meant one of three bad options:
- LangChain-style chains: brittle, hard to debug, poor at long-horizon tasks
- Hand-rolled
whileloops over the Messages API: works, but every team reinvents hooks, permissions, compaction, and subagents - Proprietary no-code agent builders: fast to demo, impossible to evolve
The Agent SDK collapses the best practices Anthropic discovered building Claude Code (a production agent that writes real code every day at Anthropic and thousands of customer teams) into a reusable primitive. The lesson they publicly documented: the three-phase loop beats everything else for action-oriented tasks, and you should stop trying to invent your own.
Why Gather then Act then Verify wins:
- Gather prevents the #1 agent failure mode: hallucinating state. Before any destructive call, the agent pulls fresh truth from the world.
- Act is where you encode what the agent can do. The set of tools is the agent's personality.
- Verify catches the #2 failure mode: confident-but-wrong outputs. Agents that check their work beat agents that don't, every time.
- Repeat lets long horizons collapse into a sequence of short, verifiable steps.
Agents that skip Verify are demos. Agents that do all three are products.
3. How to do it
3.1. Install the SDK
pip install claude-agent-sdk # Python
npm install @anthropic-ai/claude-agent-sdk # TypeScript
Set ANTHROPIC_API_KEY and you're live. The SDK speaks to Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5. Pick Opus for the orchestrator, Haiku for subagents.
3.2. The minimum viable agent (Python)
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
options = ClaudeAgentOptions(
model="claude-opus-4-7",
system_prompt="You are a senior engineer. Gather, act, verify.",
allowed_tools=["bash", "read_file", "edit_file", "web_fetch"],
permission_mode="acceptEdits", # auto-approve edits, prompt for bash
max_turns=40,
)
async with ClaudeSDKClient(options=options) as client:
async for msg in client.query(
"Find every TODO comment in ./src, categorize them "
"by severity, and write a triage report to TODOS.md."
):
print(msg)
That's the entire loop. The SDK handles the tool_use to tool_result back-and-forth, parallel calls, and context compaction.
3.3. Phase 1, Gather context well
Anthropic's recommendation order for retrieval (strongest first):
- Agentic search: let Claude run
grep,rg,tail,lsitself. Transparent, debuggable, no infrastructure. - Semantic search: add a vector DB only when grep latency becomes a problem. Less transparent, harder to maintain.
- Subagents: spawn a sub-Claude to read a huge file and return only the relevant excerpt. Protects the main context window.
- Compaction: automatic summarization of old turns when context fills.
Layout tip: treat your repo/docs folder structure as context engineering. A top-level CLAUDE.md that explains the codebase is the single highest-leverage optimization you can make. Claude reads it first, every run.
# CLAUDE.md at the project root
# Project: invoice-reconciler
# Stack: Python 3.12, Polars, DuckDB, FastAPI
# Entry points: src/main.py, src/api/routes.py
# Data: ./fixtures/*.csv — always test new logic against these
# Run tests with: pytest -q
# Do NOT modify: src/legacy/* (being deprecated)
3.4. Phase 2, design tools like an API
Rule: tools are the agent's vocabulary. An agent with a sloppy tool API will produce sloppy work.
| Good tool | Bad tool |
|---|---|
search_invoices(vendor, date_range, min_amount) | query_db(sql) |
send_slack_message(channel, thread_ts, text) | slack_tool(action, payload) |
refund_order(order_id, reason_code) | do_action(type, data) |
Prefer typed, named, purpose-built tools over generic ones. Claude picks tools based on name and description. Bad names equal bad picks.
When to use code instead of tools: for multi-step transformations (filter, join, format, write Excel), have the agent write a Python script rather than chain twelve tool calls. Code is precise, composable, and infinitely reusable. The SDK's Bash tool makes this one line.
MCP as the standard integration layer: wrap your 3rd-party connectors (Slack, GitHub, Drive, Asana, Postgres) as MCP servers. Write once, reuse everywhere. See the 10 Best MCP Servers guide for a starter kit.
3.5. Phase 3, verify or you're shipping a demo
Three verification styles, strongest last:
- Rules-based: the cheapest and most reliable.
ruff check .,pytest,mypy, schema validators. If the linter is happy, you're 80% there. - Visual: screenshot the rendered output (HTML email, chart, UI). Feed the image back to Claude with "does this match the intent?"
- LLM-as-judge: another Claude call grading the output against a rubric. Slowest, fuzziest, but the only option for subjective criteria.
Stack them when stakes are high:
# 1. Rules
subprocess.run(["ruff", "check", changed_files], check=True)
# 2. Tests
subprocess.run(["pytest", "-q"], check=True)
# 3. Judge (only if above pass)
verdict = await client.query(
f"Rate this diff 1-10 for clarity: {diff}"
)
3.6. Control surfaces worth knowing
permission_mode:default(ask for everything),acceptEdits(auto-approve file edits),bypassPermissions(dangerous; sandboxed runs only),plan(read-only mode that produces a plan, no side effects).- Hooks:
PreToolUse/PostToolUsecallbacks. UsePreToolUseto block edits to.env,secrets.json, anywhere off-limits. Most common: a secret-file blocklist. - Subagents:
spawn_subagent(prompt, allowed_tools=[...]). Use them for parallel research or to isolate a noisy context chain from the main thread. - Session forking: checkpoint a session, try a risky action, roll back if it fails.
3.7. Diagnostic checklist when the agent underperforms
Anthropic's own 4-question debug ladder:
- Missing information? Restructure your search API or add a new gathering tool.
- Repeated failures at one step? Promote that rule from the system prompt into a tool parameter constraint.
- Can't self-correct? Give it a more creative fallback tool (e.g., "if X fails, try Y").
- Inconsistent performance across runs? Build a representative eval set and run it in CI. Eval-driven agent development is the highest-leverage skill.
4. Reference architecture, production-ready agent skeleton
import anyio
from claude_agent_sdk import (
ClaudeSDKClient, ClaudeAgentOptions, tool, create_sdk_mcp_server
)
@tool("refund_order", "Refunds a Shopify order.", {"order_id": str, "reason": str})
async def refund_order(args):
order_id, reason = args["order_id"], args["reason"]
# ... call Shopify API
return {"content": [{"type": "text", "text": f"Refunded {order_id}"}]}
refund_server = create_sdk_mcp_server("ops", "1.0", tools=[refund_order])
options = ClaudeAgentOptions(
model="claude-opus-4-7",
system_prompt=open("CLAUDE.md").read(),
mcp_servers={"ops": refund_server},
allowed_tools=["bash", "read_file", "edit_file", "web_fetch",
"mcp__ops__refund_order"],
permission_mode="acceptEdits",
max_turns=60,
hooks={
"PreToolUse": [{
"matcher": "edit_file",
"hooks": [lambda ctx: None if ".env" not in ctx["path"]
else {"block": "secrets are read-only"}]
}]
},
)
async def run():
async with ClaudeSDKClient(options=options) as client:
async for msg in client.query(
"Investigate refund ticket #4412 end-to-end."):
print(msg)
anyio.run(run)
5. Key takeaways
- Gather, Act, Verify, Repeat is the whole game. Every production agent follows this loop; every failing agent is missing one phase.
- Tools are the vocabulary. Typed, named, purpose-built tools beat generic ones.
- Verify or you're shipping a demo. Rules-based checks first, LLM-judge last.
- Context engineering beats prompt engineering. CLAUDE.md and folder structure move the needle more than clever wording.
- Subagents and hooks are free wins. Parallelism plus safety for about 10 lines of config.
- Eval-driven development. Build a test set before you iterate, or you'll optimize on vibes.
If you ship one agent after reading this list, ship it on this SDK. Every other paper on this list plugs into the same loop.