Back to white papers
White Paper

Build an AI Coding Agent with the Claude Agent SDK

Jake McCluskey
Build an AI Coding Agent with the Claude Agent SDK

Source topic: "AI Code Review Agent" was one of the 5 resume projects in the @datasciencebrain carousel. This goes further. Build a full coding agent that can read, write, test, and iterate on a codebase.
Stack: Claude Agent SDK (Python/TypeScript), the same framework Claude Code itself is built on.

Why the Agent SDK beats rolling your own

The Claude Agent SDK ships:

  • File read/write/edit tools
  • Bash execution
  • Subagent spawning
  • Built-in permission/approval flow
  • Tool-use loop handling (same loop from tool-use-fundamentals, but managed for you)

Roll-your-own is fine for 1-tool demos. Real coding agents need all of the above, and hand-built versions get buggy around edge cases (streaming, long outputs, tool errors).

Build a codebase Q&A + auto-fix agent

1. Install

pip install claude-agent-sdk anthropic
export ANTHROPIC_API_KEY=sk-ant-...

2. Core agent: reads your repo, answers questions, optionally edits

import asyncio
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

async def main():
    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a senior engineer helping maintain a codebase. "
            "Read files with the Read tool, search with Grep/Glob. "
            "When asked to fix something, use Edit tool — NEVER rewrite whole files unnecessarily. "
            "Always verify fixes by reading the file back or running tests."
        ),
        model="claude-opus-4-7",
        allowed_tools=["Read", "Write", "Edit", "Glob", "Grep", "Bash"],
        cwd="/path/to/your/repo",
        permission_mode="acceptEdits",  # or "default" for approval prompts
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query("Find all the places we handle payment webhooks and summarize the flow.")

        async for msg in client.receive_response():
            if msg.type == "text":
                print(msg.text, end="", flush=True)
            elif msg.type == "tool_use":
                print(f"\n[using {msg.name}: {msg.input}]")

asyncio.run(main())

3. Level up: structured multi-step task

TASK = """
There's a bug where the `/api/checkout` endpoint returns 500 when cart is empty.
1. Find the endpoint handler
2. Read it and identify why it crashes on empty cart
3. Fix it with the minimum change
4. Find the relevant tests and run them
5. Report the fix and test output
"""

async def fix_bug():
    options = ClaudeAgentOptions(
        system_prompt="You are a focused bug-fixing agent. Do exactly what's asked, nothing more.",
        model="claude-opus-4-7",
        allowed_tools=["Read", "Edit", "Glob", "Grep", "Bash"],
        cwd="./my-app",
        permission_mode="acceptEdits",
        max_turns=30,
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query(TASK)
        async for msg in client.receive_response():
            # ... stream to user or log
            ...

asyncio.run(fix_bug())

4. Subagent pattern: delegate isolated work

For large tasks, spawn subagents to keep the main agent's context clean:

options = ClaudeAgentOptions(
    system_prompt=(
        "You are a project lead. For any isolated subtask (exploration, review, test generation), "
        "delegate to a subagent via the Agent tool with a self-contained prompt. "
        "Synthesize their results yourself."
    ),
    allowed_tools=["Read", "Edit", "Glob", "Grep", "Bash", "Agent"],
    cwd="./my-app",
)

Now when Claude hits a research subtask, it calls Agent with its own fresh prompt. The subagent returns a report to the main agent.

5. Custom tool: expose project-specific operations

The SDK lets you add tools beyond file I/O. Example: exposing your project's test runner:

from claude_agent_sdk import tool

@tool
async def run_unit_tests(test_path: str = "tests/") -> dict:
    """Run the project's unit tests. Returns pass/fail counts and failures."""
    import subprocess
    result = subprocess.run(
        ["pytest", test_path, "-q", "--tb=short"],
        capture_output=True, text=True, timeout=120,
    )
    return {
        "exit_code": result.returncode,
        "stdout_tail": result.stdout[-2000:],
        "stderr_tail": result.stderr[-1000:],
    }

options = ClaudeAgentOptions(
    system_prompt="...",
    tools=[run_unit_tests],  # Claude sees this alongside built-in tools
    allowed_tools=["Read", "Edit", "Grep", "run_unit_tests"],
)

The permission model (this is where roll-your-own fails)

Three modes matter:

ModeBehaviorUse case
defaultPrompt user before any write/bashInteractive dev
acceptEditsAuto-accept file edits, still prompt on bashCI/automation runs
bypassPermissionsSilent executionOnly inside containers you trust

For a CI bot (e.g., PR review agent that auto-fixes):

options = ClaudeAgentOptions(
    permission_mode="acceptEdits",
    # Scoped to a specific branch in a Git worktree:
    cwd="/tmp/review-worktree",
)

For an interactive assistant, leave default set. Users approve every change.

Hooks: auto-inject policies

Hooks let you run deterministic code at fixed points in the agent loop. Example: reject edits to secret files.

from claude_agent_sdk import HookEvent

async def pre_edit_hook(event: HookEvent):
    if event.type == "PreToolUse" and event.tool_name == "Edit":
        path = event.tool_input.get("file_path", "")
        if ".env" in path or "secrets/" in path:
            return {"decision": "deny", "reason": "Cannot edit secret files"}
    return {"decision": "allow"}

options = ClaudeAgentOptions(
    # ...
    hooks={"PreToolUse": [pre_edit_hook]},
)

Hook types: PreToolUse, PostToolUse, UserPromptSubmit, Stop. Good for:

  • Safety rails (block destructive commands)
  • Auto-formatting after every edit
  • Logging for audit trails

Real-world use cases

  1. PR auto-reviewer. Trigger on GitHub webhook. The agent clones the PR branch into a worktree, reviews files, and posts comments or suggests fixes (combines with Project 4 from the resume carousel).
  2. Documentation generator. Agent reads codebase, writes and updates README.md and ARCHITECTURE.md from actual code.
  3. Migration assistant. "Convert all class components to hooks." The agent enumerates, edits, runs tests, and reports.
  4. Test writer. Agent reads untested functions, generates tests, runs pytest, fixes failures.

Cost reality check

An agent that reads a medium codebase, edits, and runs tests typically burns 50-200K tokens per task. At Claude Opus 4.7 rates, that's roughly $0.75 to $3.00 per task. Use Sonnet for simpler tasks (10x cheaper, often sufficient):

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",   # cheaper, still strong at code
    # ...
)

Or use Haiku for pure scan-and-classify jobs.

Resume angle

"Built a production coding agent on the Claude Agent SDK: subagent delegation for isolated research, custom project-aware tools (test runners, lint checks), hook-based policy enforcement (blocked edits to secrets, auto-format post-edit), and three-tier permission modes for dev/CI/sandboxed use. Same framework as Claude Code itself."