AgentScope: Build AI Agents at Scale in Python
White Paper

AgentScope: Build AI Agents at Scale in Python

Jake McCluskeyUpdated
Back to white papers

What it is

AgentScope is a production-ready open-source framework for building AI agents and multi-agent systems. It's not "another LangChain wrapper." It's a complete agent ecosystem purpose-built to take you from prototype to production without rewriting your stack when you scale.

In one sentence: if you've hit the wall where your toy agent breaks the moment you add a second agent, parallel tool calls, or real infrastructure, AgentScope is the framework designed for what comes next.

What you can actually do with it

  • Build agents in minutes. ReAct loops, tool use, persistent memory, all first-class.
  • Run multi-agent workflows. Orchestrator plus specialists, peer-to-peer, pipelines, fan-out/fan-in.
  • Add human-in-the-loop control. Approval gates, pause/resume, mid-task input.
  • Deploy anywhere. Local process, serverless function, Kubernetes cluster.
  • Monitor, debug, evaluate. Built-in tracing, replay, eval harness, dashboards.

Why you should be doing this

Three reasons it belongs in your stack:

1. "Prototype to production" is where most agent projects die

Every team ships a LangChain demo in a week, then spends three months trying to make it reliable, observable, and multi-tenant. AgentScope bakes production concerns in from the first line of code: async-first runtime, typed messages, built-in tracing, and deployment adapters for serverless and K8s.

2. Multi-agent is the default, not an afterthought

The winning agent architectures in 2025 are multi-agent with a clear coordination pattern (orchestrator plus specialists usually beats single-agent on non-trivial tasks). AgentScope treats multi-agent as the primary API, not a plugin stapled onto a single-agent core.

3. Human-in-the-loop is a button, not a project

Every serious agent needs approval gates for irreversible actions (send email, place order, deploy code). In most frameworks, you build that yourself. AgentScope ships it.

How to do it, step by step

Step 1. Install

pip install agentscope

Requires Python 3.10+. Set your model key:

export ANTHROPIC_API_KEY=...   # or OPENAI_API_KEY / DASHSCOPE_API_KEY

Step 2. Your first agent in about 15 lines

import agentscope
from agentscope.agents import ReActAgent
from agentscope.models import AnthropicChatModel
from agentscope.tools import BasicToolkit

agentscope.init(
    model_configs=[AnthropicChatModel(
        config_name="claude", model_name="claude-opus-4-7"
    )],
)

toolkit = BasicToolkit()           # includes bash, file I/O, web fetch
agent   = ReActAgent(
    name="researcher",
    sys_prompt="You are a careful senior analyst. Gather, act, verify.",
    model_config_name="claude",
    toolkit=toolkit,
)

reply = agent("Summarize the top 3 news items about LLM agents this week.")
print(reply.content)

That's a complete ReAct agent with tools. No boilerplate.

Step 3. Add memory

from agentscope.memory import TemporaryMemory

agent = ReActAgent(
    name="researcher",
    sys_prompt="...",
    model_config_name="claude",
    memory=TemporaryMemory(),   # in-process
    # or VectorMemory(persist_dir="./mem") for embedding-based long-term memory
)

agent("I'm building a SaaS in Next.js.")
agent("What stack did I just mention?")   # remembers

Step 4. Multi-agent workflow, orchestrator plus specialists

from agentscope.pipelines import SequentialPipeline
from agentscope.message import Msg

planner   = ReActAgent(name="planner",   model_config_name="claude", ...)
writer    = ReActAgent(name="writer",    model_config_name="claude", ...)
reviewer  = ReActAgent(name="reviewer",  model_config_name="claude", ...)

pipeline = SequentialPipeline([planner, writer, reviewer])
result   = pipeline(Msg("user", "Write a post on LLM agent patterns.", "user"))

For parallel fan-out (e.g., three research specialists, then a synthesizer):

from agentscope.pipelines import ForLoopPipeline, parallel_pipeline

specialists = [finance_agent, tech_agent, regulatory_agent]
results     = parallel_pipeline(specialists, Msg("user", "Research Apple", "user"))
synthesis   = synthesizer(Msg("user", str(results), "user"))

Step 5. Human-in-the-loop for irreversible actions

from agentscope.tools import require_approval

@require_approval(prompt="Send this email?")
def send_email(to: str, subject: str, body: str) -> str:
    ...  # real send

toolkit.register(send_email)

When the agent calls send_email, AgentScope pauses and surfaces the call plus args to a human via console, web UI, or Slack (configurable). Approve and it sends. Reject and the agent replans.

Step 6. Deploy: local, serverless, or K8s

Local (dev):

python my_agent.py

Serverless (e.g., AWS Lambda): AgentScope ships an ASGI adapter.

from agentscope.server import create_asgi_app
app = create_asgi_app(agent)   # wrap with Mangum for Lambda

Kubernetes: there's a Helm chart with a worker plus orchestrator split.

helm install my-agents agentscope/agents -f values.yaml

Same agent code runs in all three environments. You pick the deployment target per workload.

Step 7. Monitor, debug, evaluate

Tracing. Enable it once and you get a timeline of every agent turn, every tool call, every token.

agentscope.init(
    ...,
    studio_url="http://localhost:5000",   # local observability UI
)

Open localhost:5000 and you see a live dashboard: agent graphs, call traces, token usage, latency histograms.

Evaluation. Attach an eval set to any agent:

from agentscope.evals import Benchmark

bench = Benchmark.from_csv("support-cases.csv")
score = bench.run(agent, metrics=["exact_match", "llm_judge"])
print(score.summary())

Run it in CI. Block merges on regressions.

Replay. Any trace can be replayed after a bug fix to confirm the same case now passes.

Architecture at a glance

┌────────────────────────────────────────────────────────────┐
│                                                            │
│   User request                                             │
│        │                                                   │
│        ▼                                                   │
│   ┌─────────────┐                                          │
│   │ Orchestrator│ ─┐                                       │
│   └─────────────┘  │  plans                                │
│        │           │                                       │
│        ├─▶ ReActAgent(finance)  ──uses──▶ Tools + Memory   │
│        ├─▶ ReActAgent(web)      ──uses──▶ Tools + Memory   │
│        └─▶ ReActAgent(writer)   ──uses──▶ Tools + Memory   │
│                                                            │
│   ┌─────────────┐                                          │
│   │ H-i-L gate  │ ◀── approval required for risky tools    │
│   └─────────────┘                                          │
│                                                            │
│   ┌─────────────┐                                          │
│   │   Tracer    │ ── all events ──▶ Studio UI / OTel       │
│   └─────────────┘                                          │
│                                                            │
│   Deploys to: local | serverless | K8s (same code)         │
│                                                            │
└────────────────────────────────────────────────────────────┘

When to pick AgentScope vs. alternatives

Use caseBest choice
Code-first multi-agent, Python-native, deploy to K8sAgentScope
Visual drag-and-drop builder, OpenAI-onlyAgentKit
Browser/web agent with human steering UXMagentic-UI
Single-agent code assistantClaude Agent SDK
LangChain ecosystem integration requiredLangGraph

One thing to watch out for

Multi-agent looks cool on demo. In production, it's only worth the overhead when the task actually has parallelizable subtasks, a verification step, or specialization that pays off. A well-designed single ReAct agent beats a poorly-designed five-agent swarm. Start single, add agents only when you measure a clear win. (This isn't an AgentScope-specific warning. It's the #1 lesson from every team that's shipped multi-agent systems.)

What changes once you have this

Before:

  • Demo works. Prod crashes. You rebuild twice.
  • Multi-agent means "copy the same class three times and hope."
  • Human approval is a hacked-in input("y/n").
  • No traces when it fails. No evals when it regresses.

After:

  • Same code path in laptop, Lambda, and K8s.
  • Multi-agent is one import.
  • H-i-L is a decorator.
  • Studio UI shows the full trace with one env var.
  • Evals block bad deploys in CI.

You stop building around your agent framework and start building with it.

Common questions

Frequently asked

What is AgentScope and how is it different from LangChain?

AgentScope is a production-ready open-source framework for building AI agents and multi-agent systems in Python. Unlike LangChain, it is not a wrapper but a complete agent ecosystem purpose-built to take you from prototype to production without rewriting your stack when you scale. It bakes production concerns like async-first runtime, typed messages, built-in tracing, and deployment adapters for serverless and Kubernetes in from the first line of code.

Can AgentScope deploy to Kubernetes and serverless environments with the same code?

Yes, the same agent code runs in local processes, serverless functions like AWS Lambda, and Kubernetes clusters without modification. AgentScope ships an ASGI adapter for serverless deployment and a Helm chart with a worker plus orchestrator split for Kubernetes. You pick the deployment target per workload.

How does AgentScope implement human-in-the-loop approval for agent actions?

AgentScope provides a require_approval decorator that you attach to any tool function. When an agent calls a tool marked with this decorator, AgentScope pauses and surfaces the call plus arguments to a human via console, web UI, or Slack. The human can approve (and the action executes) or reject (and the agent replans).

What built-in observability and evaluation features does AgentScope include?

AgentScope includes built-in tracing that captures a timeline of every agent turn, tool call, and token when you enable a studio_url. The framework ships a local observability UI with agent graphs, call traces, token usage, and latency histograms. It also includes an evaluation harness that runs benchmark sets with metrics like exact_match and llm_judge, plus replay capability to re-run traces after bug fixes.

When should I use multi-agent architecture versus a single agent in AgentScope?

Multi-agent is only worth the overhead when the task actually has parallelizable subtasks, a verification step, or specialization that pays off. A well-designed single ReAct agent beats a poorly-designed five-agent swarm. Start with a single agent and add agents only when you measure a clear win, as this is the number one lesson from every team that has shipped multi-agent systems.

READY TO IMPLEMENT

Want to talk through this in your business?

The paper above is the thinking. Let's spend 30 minutes on what it would actually look like to ship in your shop, no pitch, just a real scoping conversation.

AgentScope: Build AI Agents at Scale in Python