How Do I Build a Multi-Agent Workflow with LangGraph and LangSmith?

A single agent that's "good at everything" tends to be okay at nothing. The pattern that's been working — for me and for every team I've watched ship real agent systems — is to break the work into specialized sub-agents and let a supervisor route between them. LangGraph is the framework for the routing; LangSmith is how you actually debug when it goes sideways. Here's how to build a working multi-agent system you can put in front of users.
Why this matters
If you've tried building one mega-agent with 17 tools in its toolkit, you already know the failure mode: it picks the wrong tool half the time, hallucinates parameters the other half, and you can't tell why. Multi-agent systems work because each agent has a small, focused brief — and a supervisor with one job (routing) does the picking.
LangGraph treats your agent system as a graph. Each node is an agent or a deterministic function. Edges define who can hand off to whom. State flows through the graph and is logged at every step. LangSmith picks up that log and makes it inspectable — every agent call, every tool call, every prompt, every output.
Before you start
You need:
- Python 3.10+ and a clean virtualenv.
- An OpenAI or Anthropic API key. Examples use Anthropic.
- A LangSmith account (free tier is fine). Sign up at
smith.langchain.comto get an API key. - About 45 minutes for the full build.
python -m venv .venv && source .venv/bin/activate
pip install langgraph langchain-anthropic langsmithSet both keys:
export ANTHROPIC_API_KEY=sk-ant-...
export LANGSMITH_API_KEY=lsv2_pt_...
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=multi-agent-demoStep 1: Define your agents' jobs
The discipline that makes multi-agent work: write down each agent's job in one sentence before you write any code. If you can't compress it, the agent is too broad.
For this guide, we'll build a research-and-write team:
- Researcher — finds 3 sources on a topic, returns notes.
- Writer — turns notes into a 300-word draft.
- Editor — cuts hype and tightens to 200 words.
- Supervisor — picks who runs next.
Each is a single-purpose agent. Total system: ~80 lines of code.
Step 2: Build the agents as functions
Each agent is a function from state to state. We'll use a shared AgentState dict for messages and intermediate work.
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_anthropic import ChatAnthropic
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], add_messages]
notes: str
draft: str
final: str
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929", temperature=0)
def researcher(state: AgentState) -> dict:
topic = state["messages"][-1].content
prompt = f"Find 3 short factual bullets on this topic and return as a numbered list:\n{topic}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"notes": res.content, "messages": [AIMessage(content="Research done.", name="researcher")]}
def writer(state: AgentState) -> dict:
prompt = f"Turn these notes into a 300-word draft in a friendly, jargon-free voice:\n{state['notes']}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"draft": res.content, "messages": [AIMessage(content="Draft done.", name="writer")]}
def editor(state: AgentState) -> dict:
prompt = f"Cut hype, tighten to ~200 words, no filler, no jargon:\n{state['draft']}"
res = llm.invoke([HumanMessage(content=prompt)])
return {"final": res.content, "messages": [AIMessage(content="Final done.", name="editor")]}Each agent reads what it needs from state, writes what it produced, returns the diff. Clean.
Step 3: Add the supervisor
The supervisor decides what runs next. Simplest version: it's a function that returns the name of the next node based on what's in state.
def supervisor(state: AgentState) -> str:
if not state.get("notes"):
return "researcher"
if not state.get("draft"):
return "writer"
if not state.get("final"):
return "editor"
return "END"For a smarter supervisor, replace the deterministic logic with an LLM call that reads state and outputs a routing decision. Both work; start deterministic, upgrade when you actually need it.
Step 4: Wire the graph
from langgraph.graph import StateGraph, END
graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("editor", editor)
graph.set_entry_point("researcher")
graph.add_conditional_edges("researcher", supervisor, {"writer": "writer", "END": END})
graph.add_conditional_edges("writer", supervisor, {"editor": "editor", "END": END})
graph.add_conditional_edges("editor", supervisor, {"END": END})
app = graph.compile()The conditional edges call supervisor(state) after each node runs and route based on the return value. Once final is populated, the graph hits END and returns final state.
Step 5: Run it
init_state = {"messages": [HumanMessage(content="The business case for AI prompt caching.")], "notes": "", "draft": "", "final": ""}
result = app.invoke(init_state)
print(result["final"])Run it and you'll see the researcher → writer → editor flow. The final output is in result["final"].
Verify it worked
Two checks:
- The script returns a coherent ~200-word piece. If it returns notes or a 300-word draft, the supervisor isn't routing past that step — re-check your conditional edges.
- LangSmith shows the trace. Open
https://smith.langchain.com, find themulti-agent-demoproject, click the latest run. You should see four nested calls (entry, researcher, writer, editor) with prompts, responses, and latencies for each.
If the trace is missing, your LANGSMITH_TRACING and LANGSMITH_API_KEY env vars aren't loaded — re-source your shell or print them to confirm.
Where this breaks
- State growing unbounded. Long-running graphs accumulate messages forever. Trim
state["messages"]periodically or useadd_messageswithRemoveMessageto drop old turns. - Supervisor loops. A bad supervisor returns the same node twice and you get an infinite loop. Always have a counter or a max-step ceiling on the graph:
app = graph.compile().with_config(recursion_limit=15). - Token bills. Multi-agent means multi-prompt — every agent call is a fresh API hit. Cache where possible (Claude prompt caching helps if your system prompts are stable) and don't make every node an LLM call when a deterministic function will do.
- LangSmith doesn't capture errors clearly. If an agent raises, the trace shows a red node but the error detail is sometimes truncated. Add explicit
try/exceptin each agent and write the exception to state so you can see it in the trace.
What to try next
Let's talk about your AI + SEO stack
If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.
Let's Talk