Back to guides

How Do I Build a Multi-Agent Workflow with LangGraph and LangSmith?

Jake McCluskeyIntermediate45 min read
How Do I Build a Multi-Agent Workflow with LangGraph and LangSmith?

A single agent that's "good at everything" tends to be okay at nothing. The pattern that's been working — for me and for every team I've watched ship real agent systems — is to break the work into specialized sub-agents and let a supervisor route between them. LangGraph is the framework for the routing; LangSmith is how you actually debug when it goes sideways. Here's how to build a working multi-agent system you can put in front of users.

Why this matters

If you've tried building one mega-agent with 17 tools in its toolkit, you already know the failure mode: it picks the wrong tool half the time, hallucinates parameters the other half, and you can't tell why. Multi-agent systems work because each agent has a small, focused brief — and a supervisor with one job (routing) does the picking.

LangGraph treats your agent system as a graph. Each node is an agent or a deterministic function. Edges define who can hand off to whom. State flows through the graph and is logged at every step. LangSmith picks up that log and makes it inspectable — every agent call, every tool call, every prompt, every output.

Before you start

You need:

  • Python 3.10+ and a clean virtualenv.
  • An OpenAI or Anthropic API key. Examples use Anthropic.
  • A LangSmith account (free tier is fine). Sign up at smith.langchain.com to get an API key.
  • About 45 minutes for the full build.
bash
python -m venv .venv && source .venv/bin/activate
pip install langgraph langchain-anthropic langsmith

Set both keys:

bash
export ANTHROPIC_API_KEY=sk-ant-...
export LANGSMITH_API_KEY=lsv2_pt_...
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=multi-agent-demo

Step 1: Define your agents' jobs

The discipline that makes multi-agent work: write down each agent's job in one sentence before you write any code. If you can't compress it, the agent is too broad.

For this guide, we'll build a research-and-write team:

  • Researcher — finds 3 sources on a topic, returns notes.
  • Writer — turns notes into a 300-word draft.
  • Editor — cuts hype and tightens to 200 words.
  • Supervisor — picks who runs next.

Each is a single-purpose agent. Total system: ~80 lines of code.

Step 2: Build the agents as functions

Each agent is a function from state to state. We'll use a shared AgentState dict for messages and intermediate work.

python
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_anthropic import ChatAnthropic
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[List[BaseMessage], add_messages]
    notes: str
    draft: str
    final: str

llm = ChatAnthropic(model="claude-sonnet-4-5-20250929", temperature=0)

def researcher(state: AgentState) -> dict:
    topic = state["messages"][-1].content
    prompt = f"Find 3 short factual bullets on this topic and return as a numbered list:\n{topic}"
    res = llm.invoke([HumanMessage(content=prompt)])
    return {"notes": res.content, "messages": [AIMessage(content="Research done.", name="researcher")]}

def writer(state: AgentState) -> dict:
    prompt = f"Turn these notes into a 300-word draft in a friendly, jargon-free voice:\n{state['notes']}"
    res = llm.invoke([HumanMessage(content=prompt)])
    return {"draft": res.content, "messages": [AIMessage(content="Draft done.", name="writer")]}

def editor(state: AgentState) -> dict:
    prompt = f"Cut hype, tighten to ~200 words, no filler, no jargon:\n{state['draft']}"
    res = llm.invoke([HumanMessage(content=prompt)])
    return {"final": res.content, "messages": [AIMessage(content="Final done.", name="editor")]}

Each agent reads what it needs from state, writes what it produced, returns the diff. Clean.

Step 3: Add the supervisor

The supervisor decides what runs next. Simplest version: it's a function that returns the name of the next node based on what's in state.

python
def supervisor(state: AgentState) -> str:
    if not state.get("notes"):
        return "researcher"
    if not state.get("draft"):
        return "writer"
    if not state.get("final"):
        return "editor"
    return "END"

For a smarter supervisor, replace the deterministic logic with an LLM call that reads state and outputs a routing decision. Both work; start deterministic, upgrade when you actually need it.

Step 4: Wire the graph

python
from langgraph.graph import StateGraph, END

graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("editor", editor)

graph.set_entry_point("researcher")
graph.add_conditional_edges("researcher", supervisor, {"writer": "writer", "END": END})
graph.add_conditional_edges("writer",     supervisor, {"editor": "editor", "END": END})
graph.add_conditional_edges("editor",     supervisor, {"END": END})

app = graph.compile()

The conditional edges call supervisor(state) after each node runs and route based on the return value. Once final is populated, the graph hits END and returns final state.

Step 5: Run it

python
init_state = {"messages": [HumanMessage(content="The business case for AI prompt caching.")], "notes": "", "draft": "", "final": ""}
result = app.invoke(init_state)
print(result["final"])

Run it and you'll see the researcher → writer → editor flow. The final output is in result["final"].

Verify it worked

Two checks:

  1. The script returns a coherent ~200-word piece. If it returns notes or a 300-word draft, the supervisor isn't routing past that step — re-check your conditional edges.
  2. LangSmith shows the trace. Open https://smith.langchain.com, find the multi-agent-demo project, click the latest run. You should see four nested calls (entry, researcher, writer, editor) with prompts, responses, and latencies for each.

If the trace is missing, your LANGSMITH_TRACING and LANGSMITH_API_KEY env vars aren't loaded — re-source your shell or print them to confirm.

Where this breaks

  • State growing unbounded. Long-running graphs accumulate messages forever. Trim state["messages"] periodically or use add_messages with RemoveMessage to drop old turns.
  • Supervisor loops. A bad supervisor returns the same node twice and you get an infinite loop. Always have a counter or a max-step ceiling on the graph: app = graph.compile().with_config(recursion_limit=15).
  • Token bills. Multi-agent means multi-prompt — every agent call is a fresh API hit. Cache where possible (Claude prompt caching helps if your system prompts are stable) and don't make every node an LLM call when a deterministic function will do.
  • LangSmith doesn't capture errors clearly. If an agent raises, the trace shows a red node but the error detail is sometimes truncated. Add explicit try/except in each agent and write the exception to state so you can see it in the trace.

What to try next

Want this built for you instead?

Let's talk about your AI + SEO stack

If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.

Let's Talk
Questions from readers

Frequently asked

Why use LangGraph over a plain Python orchestrator?

Two things: state management is built in (no hand-rolled context passing) and LangSmith integration is one env var. If you don't need state or tracing, plain Python is fine — but most multi-agent systems grow into needing both.

How many agents is too many?

Three to seven is the sweet spot. Beyond that, the supervisor routing gets fuzzy and debugging gets hard. If you find yourself wanting twelve, the problem usually wants a different shape (a single agent with twelve tools, not twelve agents).

Does LangSmith cost money?

Free tier covers small workloads (5,000 traces/month at the time of writing). Paid plans kick in for larger volume. Even on the free tier, the traces are the difference between debugging in 5 minutes vs. 5 hours.

Can I mix Anthropic and OpenAI agents in one graph?

Yes. Each node can use a different ChatModel. Useful when one agent needs Claude's reasoning and another can use a cheaper, faster model for a routing decision.

What happens if a node throws an exception?

The graph stops and LangSmith shows the failed node in red. Add try/except around each agent function and write the exception text to state — that way the supervisor can route to a recovery node instead of crashing.