A single agent that's "good at everything" tends to be okay at nothing. The pattern that's been working, for me and for every team I've watched ship real agent systems, is to break the work into specialized sub-agents and let a supervisor route between them. LangGraph is the framework for the routing; LangSmith is how you actually debug when it goes sideways. Here's how to build a working multi-agent system you can put in front of users.
Why this matters
If you've tried building one mega-agent with 17 tools in its toolkit, you already know the failure mode: it picks the wrong tool half the time, hallucinates parameters the other half, and you can't tell why. Multi-agent systems work because each agent has a small, focused brief, and a supervisor with one job (routing) does the picking.
LangGraph treats your agent system as a graph. Each node is an agent or a deterministic function. Edges define who can hand off to whom. State flows through the graph and is logged at every step. LangSmith picks up that log and makes it inspectable, every agent call, every tool call, every prompt, every output.
Before you start
You need:
- Python 3.10+ and a clean virtualenv.
- An OpenAI or Anthropic API key. Examples use Anthropic.
- A LangSmith account (free tier is fine). Sign up at
smith.langchain.comto get an API key. - About 45 minutes for the full build.
python -m venv .venv && source .venv/bin/activate
pip install langgraph langchain-anthropic langsmithSet both keys:
export ANTHROPIC_API_KEY=sk-ant-...
export LANGSMITH_API_KEY=lsv2_pt_...
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=multi-agent-demoStep 1: Define your agents' jobs
The discipline that makes multi-agent work: write down each agent's job in one sentence before you write any code. If you can't compress it, the agent is too broad.
For this guide, we'll build a research-and-write team:
- Researcher, finds 3 sources on a topic, returns notes.
- Writer, turns notes into a 300-word draft.
- Editor, cuts hype and tightens to 200 words.
- Supervisor, picks who runs next.
Each is a single-purpose agent. Total system: ~80 lines of code.
Step 2: Build the agents as functions
Each agent is a function from state to state. We'll use a shared AgentState dict for messages and intermediate work.
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_anthropic import ChatAnthropic
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], add_messages]
notes: str
draft: str
final: str
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929", temperature=0)
def researcher(state: AgentState) -> dict:
topic = state["messages"][-1].content
prompt = f"Find 3 short factual bullets on this topic and return as a numbered list:\n{topic}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"notes": res.content, "messages": [AIMessage(content="Research done.", name="researcher")]}
def writer(state: AgentState) -> dict:
prompt = f"Turn these notes into a 300-word draft in a friendly, jargon-free voice:\n{state['notes']}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"draft": res.content, "messages": [AIMessage(content="Draft done.", name="writer")]}
def editor(state: AgentState) -> dict:
prompt = f"Cut hype, tighten to ~200 words, no filler, no jargon:\n{state['draft']}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"final": res.content, "messages": [AIMessage(content="Final done.", name="editor")]}Each agent reads what it needs from state, writes what it produced, returns the diff. Clean.
Step 3: Add the supervisor
The supervisor decides what runs next. Simplest version: it's a function that returns the name of the next node based on what's in state.
def supervisor(state: AgentState) -> str:
if not state.get("notes"):
return "researcher"
if not state.get("draft"):
return "writer"
if not state.get("final"):
return "editor"
return "END"For a smarter supervisor, replace the deterministic logic with an LLM call that reads state and outputs a routing decision. Both work; start deterministic, upgrade when you actually need it.
Step 4: Wire the graph
from langgraph.graph import StateGraph, END
graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("editor", editor)
graph.set_entry_point("researcher")
graph.add_conditional_edges("researcher", supervisor, {"writer": "writer", "END": END})
graph.add_conditional_edges("writer", supervisor, {"editor": "editor", "END": END})
graph.add_conditional_edges("editor", supervisor, {"END": END})
app = graph.compile()The conditional edges call supervisor(state) after each node runs and route based on the return value. Once final is populated, the graph hits END and returns final state.
Step 5: Run it
init_state = {"messages": [HumanMessage(content="The business case for AI prompt caching.")], "notes": "", "draft": "", "final": ""}
result = app.invoke(init_state)
print(result["final"])Run it and you'll see the researcher → writer → editor flow. The final output is in result["final"].
Verify it worked
Two checks:
- The script returns a coherent ~200-word piece. If it returns notes or a 300-word draft, the supervisor isn't routing past that step, re-check your conditional edges.
- LangSmith shows the trace. Open
https://smith.langchain.com, find themulti-agent-demoproject, click the latest run. You should see four nested calls (entry, researcher, writer, editor) with prompts, responses, and latencies for each.
If the trace is missing, your LANGSMITH_TRACING and LANGSMITH_API_KEY env vars aren't loaded, re-source your shell or print them to confirm.
Where this breaks
- State growing unbounded. Long-running graphs accumulate messages forever. Trim
state["messages"]periodically or useadd_messageswithRemoveMessageto drop old turns. - Supervisor loops. A bad supervisor returns the same node twice and you get an infinite loop. Always have a counter or a max-step ceiling on the graph:
app = graph.compile().with_config(recursion_limit=15). - Token bills. Multi-agent means multi-prompt, every agent call is a fresh API hit. Cache where possible (Claude prompt caching helps if your system prompts are stable) and don't make every node an LLM call when a deterministic function will do.
- LangSmith doesn't capture errors clearly. If an agent raises, the trace shows a red node but the error detail is sometimes truncated. Add explicit
try/exceptin each agent and write the exception to state so you can see it in the trace.
What to try next
Let's talk about your AI + SEO stack
If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.
Let's Talk