How Do I Build a Research Pipeline with the Anthropic Agent SDK?

The Agent SDK is how you stop being the human loop between Claude and "the internet plus your tools." Instead of you typing "search this, then check that, then summarize," the SDK runs an agent that does all three itself — tool call, reason, tool call, reason, until the task is done. For research workflows, it's the difference between "this took me an afternoon" and "a cron job produced the report before I was awake." Here's a production-grade research pipeline you can have running tonight.
Why this matters
A research pipeline is the canonical agent use case. You give the agent a research brief, tools to search and read, and a writing model. It iterates: decide what to look up, look it up, notice gaps, look those up too, synthesize. The human isn't in the loop until the report is drafted.
The Agent SDK gives you the orchestration primitives — tool definitions, tool execution, the reasoning loop, stopping conditions, usage tracking — without you reinventing them. You get to focus on the brief and the tools; the SDK runs the loop.
Before you start
You need:
- Python 3.10+ or Node 20+. This guide uses Python for the first example; the Node SDK is equivalent.
- An Anthropic API key with Claude access. Generate from console.anthropic.com.
- A web search tool you trust — Brave Search API, Tavily, or SerpAPI all work. I'll use Tavily; swap as needed.
- A research brief to test with. Good first ones: "State of LLM evaluation tools as of Q2 2026." Narrow scope, public signal, checkable output.
Step 1: Install the SDK
pip install anthropic
pip install tavily-python # or brave-search / serpapiSet env vars:
export ANTHROPIC_API_KEY="sk-ant-..."
export TAVILY_API_KEY="tvly-..."Step 2: Define your tools
The agent needs tools. For a research pipeline, minimum viable:
import os
from anthropic import Anthropic
from tavily import TavilyClient
client = Anthropic()
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
tools = [
{
"name": "web_search",
"description": "Search the web for current information. Returns top results with title, URL, and snippet.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"},
"num_results": {"type": "integer", "default": 5},
},
"required": ["query"],
},
},
{
"name": "fetch_page",
"description": "Fetch the content of a specific URL and return its text.",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "The URL to fetch"},
},
"required": ["url"],
},
},
]
def execute_tool(name, tool_input):
if name == "web_search":
results = tavily.search(
tool_input["query"],
max_results=tool_input.get("num_results", 5),
)
return results
if name == "fetch_page":
# Use tavily's extract, or requests + BeautifulSoup
content = tavily.extract(tool_input["url"])
return content
raise ValueError(f"Unknown tool: {name}")Keep the tool set small at first. Two or three is enough for a research pipeline. More tools = more decisions for the agent = more chances to wander off the brief.
Step 3: Run the agent loop
The loop: send messages + tools, if Claude's response says stop_reason: tool_use, execute the tool, append the result, send again. Repeat until stop_reason: end_turn.
def run_research(brief: str, max_iterations: int = 15):
messages = [
{
"role": "user",
"content": f"""You are a research assistant. Produce a thorough report
on the following brief. Use the web_search and fetch_page tools to gather
current information. Cite every specific claim with a URL.
Stop and produce the final report when you have enough material — don't
over-research. Aim for 800-1500 words.
Brief: {brief}""",
}
]
for i in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=8096,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
# Agent is done; extract the final text
final_text = ""
for block in response.content:
if block.type == "text":
final_text += block.text
return final_text
if response.stop_reason == "tool_use":
# Execute tools, append results to messages
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
continue
break # Unexpected stop reason; bail
raise RuntimeError(f"Exceeded {max_iterations} iterations without completing")Replace the model name with whichever Claude model your account has access to. The SDK auto-handles model-specific differences.
Step 4: Run it on a real brief
if __name__ == "__main__":
report = run_research("State of LLM evaluation tools as of Q2 2026")
print(report)Watch the agent work (add a print(f"iter {i}: {response.stop_reason}") inside the loop if you want the play-by-play). You'll see it search, fetch, search again, synthesize. It usually converges in 4-8 iterations.
Step 5: Add structured output
Freeform reports are fine for reading. For pipelines that feed into anything downstream (a CMS, a database, another agent), force structured output:
# Append to the initial prompt:
"""
Return the final report as valid JSON with this shape:
{
"title": "string",
"tldr": "string, max 2 sentences",
"sections": [
{"heading": "string", "body": "string", "sources": ["url1", "url2"]}
],
"key_facts": ["fact 1 with inline citation", ...]
}
Do not include any prose outside the JSON block.
"""Then parse the output in your pipeline. Schema validation catches hallucinated structure.
For stricter enforcement, use tool use for structured output: define a submit_report tool with the schema, force the agent to call it to end the task, and read the structured input from that tool call.
Step 6: Wrap with safety rails
Production research agents need limits:
- Iteration cap — as above, fail if it runs too long.
- Token budget — sum
usage.input_tokens + usage.output_tokensacross iterations; bail if over budget. - Rate limiting — if your search tool charges per call, cap calls per run.
- Domain allowlist — reject
fetch_pageURLs outside a list of domains you trust. Prevents the agent from fetching adversarial pages. - Logging — log every tool call, its input, and its output. When a report goes off the rails, the log is how you debug.
Skip these in a notebook, have them on by the time you schedule the agent or point it at real work.
Verify it worked
1. One complete run produces a report. On a narrow, testable brief, the final text should cover the topic with real citations. Spot-check two citations — they should exist and say roughly what the report claims.
2. Iteration count is sane. Most briefs should finish in 4-10 iterations. If yours is hitting 15, either the brief is too broad or the agent is over-researching. Tighten the stop criterion in the prompt.
3. Cost per run is predictable. Log usage each iteration. A typical research run is $0.05-$0.30 in input + output tokens. If yours is $5, something's wrong — check for tool results that are dumping huge raw HTML into context.
Where this breaks
- Unbounded fetching. A page with 50KB of content eats context every iteration. Either truncate in the tool execution (
content[:5000]), or have the tool extract just the main article text, not the whole HTML. - Looping on unproductive queries. Agent keeps searching the same thing with slight variations because results are mediocre. Fix: add a tool-use count per query prefix, or include in the system prompt "if a search returns low-quality results, pivot the query substantially instead of tweaking."
- Citations that don't exist. The agent can fabricate a URL that looks right but isn't. Always validate URLs in post-processing — HEAD request each cited URL; flag 404s.
- The agent deciding the brief is "done" too early. Common if the initial search returns a lot of surface-level results. Mitigate by requiring a minimum section count or a minimum citation count in the output schema.
- Tool result schema drift. Your search provider changes their response format. The agent gets unfamiliar data and behaves weirdly. Normalize tool results to a stable internal schema, regardless of provider.
What to try next
- How Do I Build an MCP Server That Lets Claude Query My Postgres Database? — the same agent pattern, pointed at your internal database instead of the web.
- How Do I Cut My Anthropic Bill in Half Using the Batch API? — when you're running the agent on 100 briefs overnight, batch is how you afford it.
- How Do I Schedule Claude Code to Run Overnight Jobs? — wrap this agent in a scheduled runner and you've got a research team.
Let's talk about your AI + SEO stack
If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.
Let's Talk