Back to guides

How Do I Build AI Agents at Scale with AgentScope?

Jake McCluskeyIntermediate30 min read
How Do I Build AI Agents at Scale with AgentScope?

If you've shipped an LLM prototype that works on your laptop and dies the moment a real user touches it, AgentScope is the framework that's been missing. It's an open-source agent runtime built by Alibaba's research team for the part everyone skips — making agents survive in production. Here's how to install it, build your first ReAct agent, wire in tools and memory, and run a multi-agent workflow you can actually deploy.

Why this matters

Most agent code I see is one Python file, one OpenAI client, and a while True loop with a TODO comment that says "add error handling later." That's fine for a demo. It falls apart the second you need parallel tool calls, persistent memory, human approval, or a deployment story that isn't "I run it on my MacBook."

AgentScope handles those for you. It's a real framework — not a toy, not a wrapper around LangChain — designed to take an agent from notebook to Kubernetes without rewriting it.

Before you start

You need:

  • Python 3.10+ installed.
  • An LLM provider key — OpenAI, Anthropic, DashScope, or any OpenAI-compatible endpoint. I'll use Anthropic in the examples; swap as needed.
  • A clean virtualenv. Don't pip install this into your system Python.
  • About 30 minutes for the full walkthrough.

Step 1: Install AgentScope

bash
python -m venv .venv
source .venv/bin/activate
pip install agentscope

That gives you the core. If you want the development extras (Studio dashboard, evaluator, the works), install with the option group:

bash
pip install "agentscope[full]"

Set your API key as an env var so you don't paste it into code:

bash
export ANTHROPIC_API_KEY=sk-ant-...

Step 2: Build a single ReAct agent

ReAct (Reasoning + Acting) is the loop where the agent thinks, picks a tool, runs it, sees the result, and decides what to do next. AgentScope ships a ReActAgent class that handles the whole loop for you.

python
import agentscope
from agentscope.agents import ReActAgent
from agentscope.models import AnthropicChatModel

agentscope.init(
    model_configs=[{
        "config_name": "claude_sonnet",
        "model_type": "anthropic_chat",
        "model_name": "claude-sonnet-4-5-20250929",
    }]
)

researcher = ReActAgent(
    name="Researcher",
    sys_prompt="You research topics and cite sources.",
    model_config_name="claude_sonnet",
    tools=[],
    max_iters=8,
)

response = researcher(agentscope.message.Msg("user", "What's the cache hit rate sweet spot for Claude prompt caching?", role="user"))
print(response.content)

Three things doing real work here:

  • agentscope.init registers model configs once for the whole process — no client construction in every agent.
  • ReActAgent runs the think → act → observe loop until the model stops calling tools or hits max_iters.
  • The Msg object is AgentScope's message envelope. Every interaction passes through it, which is what makes logging, persistence, and replay tractable later.

Step 3: Add tools the agent can call

A ReAct agent without tools is just a chatbot with extra steps. Pass plain Python functions as tools and AgentScope auto-generates the schema from the type hints and docstring.

python
from agentscope.service import ServiceToolkit, ServiceResponse, ServiceExecStatus
import requests

def get_pypi_downloads(package: str) -> ServiceResponse:
    """Get last-month download count for a PyPI package.

    Args:
        package (str): Exact package name on PyPI.
    """
    r = requests.get(f"https://pypistats.org/api/packages/{package}/recent")
    if r.status_code != 200:
        return ServiceResponse(ServiceExecStatus.ERROR, f"HTTP {r.status_code}")
    data = r.json()["data"]
    return ServiceResponse(ServiceExecStatus.SUCCESS, data)

toolkit = ServiceToolkit()
toolkit.add(get_pypi_downloads)

researcher = ReActAgent(
    name="Researcher",
    sys_prompt="You research Python packages.",
    model_config_name="claude_sonnet",
    service_toolkit=toolkit,
    max_iters=8,
)

Now when the agent reasons "I need download stats for agentscope," it will call your function and observe the result before deciding what to say.

Step 4: Add memory so the agent remembers across turns

Out of the box, AgentScope agents have a temporary memory buffer that holds the current conversation. For longer-lived state, swap in a persistent store:

python
from agentscope.memory import TemporaryMemory

researcher.memory = TemporaryMemory(config={"max_size": 20})

For production, point memory at Redis or a vector DB (Milvus, Qdrant). The memory interface is intentionally small — three methods (add, get, clear) — so you can drop in your own.

Step 5: Run a multi-agent workflow

This is where AgentScope earns the install. Build a small team — researcher, writer, editor — and let a supervisor coordinate them with a workflow primitive.

python
from agentscope.agents import ReActAgent
from agentscope.pipelines import sequential_pipeline

researcher = ReActAgent(name="Researcher", sys_prompt="Pull facts and cite sources.", model_config_name="claude_sonnet", service_toolkit=toolkit)
writer     = ReActAgent(name="Writer",     sys_prompt="Turn research notes into a 300-word brief in Jake's voice.", model_config_name="claude_sonnet")
editor     = ReActAgent(name="Editor",     sys_prompt="Cut hype, tighten, return final.", model_config_name="claude_sonnet")

brief_topic = agentscope.message.Msg("user", "Best practices for monorepo Claude Code usage", role="user")
final = sequential_pipeline([researcher, writer, editor], brief_topic)
print(final.content)

sequential_pipeline passes each agent's output to the next as input. AgentScope also has MsgHub (broadcast pattern), forlooppipeline (iterate), and a graph-based workflow if you need branching logic.

Verify it worked

Two quick checks:

  1. Run the single-agent script. You should see reasoning + tool calls printed, ending in a final answer that references the tool result.
  2. Open AgentScope Studio. If you installed the full extras, run as_studio in a terminal and open http://localhost:5000. You'll see every agent run, every tool call, latency per step, and token usage. This is the production debugging surface most agent code is missing.
bash
as_studio

If both work, you're set up.

Where this breaks

  • Token bills. ReAct loops can run away — set max_iters to a sane ceiling (8–12) and watch your token usage in Studio.
  • Tool function failures. If a tool returns garbage, the agent will reason itself in circles. Always return ServiceExecStatus.ERROR with a clear message — the agent uses it to course-correct.
  • Model drift between providers. A prompt that runs clean on Claude may loop on a smaller open model. Keep the model config in one place so you can swap and re-run benchmarks.
  • Memory unbounded growth. TemporaryMemory keeps everything until cleared. For long-running agents, set max_size or rotate.

What to try next

Want this built for you instead?

Let's talk about your AI + SEO stack

If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.

Let's Talk
Questions from readers

Frequently asked

Is AgentScope a replacement for LangGraph or AgentSDK?

It's a different bet. AgentScope ships more out of the box (Studio dashboard, deployment templates, evaluator), where LangGraph stays minimal and explicit. If you want a framework that gets you to production fast, AgentScope. If you want to compose every primitive yourself, LangGraph.

Does AgentScope work with Claude?

Yes. The model_type 'anthropic_chat' uses Anthropic's API directly. Set ANTHROPIC_API_KEY and pass model_name 'claude-sonnet-4-5-20250929' (or whatever model you're using).

Can I run AgentScope on Kubernetes?

That's one of the design goals. The framework includes deployment templates for local, serverless, and K8s. The agent code itself doesn't change between targets — you swap the runtime config.

How much does it cost to run a small AgentScope deployment?

Compute is whatever your host charges. The real cost is LLM tokens — every agent step is an API call. Set max_iters tight (8-12) and use prompt caching where the system prompt is stable.

Should I use AgentScope for a single-agent app?

Probably overkill. AgentScope earns its complexity once you have 3+ agents collaborating. For a single agent, the Anthropic SDK plus a 100-line ReAct loop is enough.