How Do I Build AI Agents at Scale with AgentScope?
How-To Guide

How Do I Build AI Agents at Scale with AgentScope?

Jake McCluskeyUpdated Intermediate30 min
Back to guides

If you've shipped an LLM prototype that works on your laptop and dies the moment a real user touches it, AgentScope is the framework that's been missing. It's an open-source agent runtime built by Alibaba's research team for the part everyone skips, making agents survive in production. Here's how to install it, build your first ReAct agent, wire in tools and memory, and run a multi-agent workflow you can actually deploy.

Why this matters

Most agent code I see is one Python file, one OpenAI client, and a while True loop with a TODO comment that says "add error handling later." That's fine for a demo. It falls apart the second you need parallel tool calls, persistent memory, human approval, or a deployment story that isn't "I run it on my MacBook."

AgentScope handles those for you. It's a real framework, not a toy, not a wrapper around LangChain, designed to take an agent from notebook to Kubernetes without rewriting it.

Before you start

You need:

  • Python 3.10+ installed.
  • An LLM provider key, OpenAI, Anthropic, DashScope, or any OpenAI-compatible endpoint. I'll use Anthropic in the examples; swap as needed.
  • A clean virtualenv. Don't pip install this into your system Python.
  • About 30 minutes for the full walkthrough.

Step 1: Install AgentScope

bash
python -m venv .venv
source .venv/bin/activate
pip install agentscope

That gives you the core. If you want the development extras (Studio dashboard, evaluator, the works), install with the option group:

bash
pip install "agentscope[full]"

Set your API key as an env var so you don't paste it into code:

bash
export ANTHROPIC_API_KEY=sk-ant-...

Step 2: Build a single ReAct agent

ReAct (Reasoning + Acting) is the loop where the agent thinks, picks a tool, runs it, sees the result, and decides what to do next. AgentScope ships a ReActAgent class that handles the whole loop for you.

python
import agentscope
from agentscope.agents import ReActAgent
from agentscope.models import AnthropicChatModel

agentscope.init(
    model_configs=[{
        "config_name": "claude_sonnet",
        "model_type": "anthropic_chat",
        "model_name": "claude-sonnet-4-5-20250929",
    }]
)

researcher = ReActAgent(
    name="Researcher",
    sys_prompt="You research topics and cite sources.",
    model_config_name="claude_sonnet",
    tools=[],
    max_iters=8,
)

response = researcher(agentscope.message.Msg("user", "What's the cache hit rate sweet spot for Claude prompt caching?", role="user"))
print(response.content)

Three things doing real work here:

  • agentscope.init registers model configs once for the whole process, no client construction in every agent.
  • ReActAgent runs the think → act → observe loop until the model stops calling tools or hits max_iters.
  • The Msg object is AgentScope's message envelope. Every interaction passes through it, which is what makes logging, persistence, and replay tractable later.

Step 3: Add tools the agent can call

A ReAct agent without tools is just a chatbot with extra steps. Pass plain Python functions as tools and AgentScope auto-generates the schema from the type hints and docstring.

python
from agentscope.service import ServiceToolkit, ServiceResponse, ServiceExecStatus
import requests

def get_pypi_downloads(package: str) -> ServiceResponse:
    """Get last-month download count for a PyPI package.

    Args:
        package (str): Exact package name on PyPI.
    """
    r = requests.get(f"https://pypistats.org/api/packages/{package}/recent")
    if r.status_code != 200:
        return ServiceResponse(ServiceExecStatus.ERROR, f"HTTP {r.status_code}")
    data = r.json()["data"]
    return ServiceResponse(ServiceExecStatus.SUCCESS, data)

toolkit = ServiceToolkit()
toolkit.add(get_pypi_downloads)

researcher = ReActAgent(
    name="Researcher",
    sys_prompt="You research Python packages.",
    model_config_name="claude_sonnet",
    service_toolkit=toolkit,
    max_iters=8,
)

Now when the agent reasons "I need download stats for agentscope," it will call your function and observe the result before deciding what to say.

Step 4: Add memory so the agent remembers across turns

Out of the box, AgentScope agents have a temporary memory buffer that holds the current conversation. For longer-lived state, swap in a persistent store:

python
from agentscope.memory import TemporaryMemory

researcher.memory = TemporaryMemory(config={"max_size": 20})

For production, point memory at Redis or a vector DB (Milvus, Qdrant). The memory interface is intentionally small, three methods (add, get, clear), so you can drop in your own.

Step 5: Run a multi-agent workflow

This is where AgentScope earns the install. Build a small team, researcher, writer, editor, and let a supervisor coordinate them with a workflow primitive.

python
from agentscope.agents import ReActAgent
from agentscope.pipelines import sequential_pipeline

researcher = ReActAgent(name="Researcher", sys_prompt="Pull facts and cite sources.", model_config_name="claude_sonnet", service_toolkit=toolkit)
writer     = ReActAgent(name="Writer",     sys_prompt="Turn research notes into a 300-word brief in Jake's voice.", model_config_name="claude_sonnet")
editor     = ReActAgent(name="Editor",     sys_prompt="Cut hype, tighten, return final.", model_config_name="claude_sonnet")

brief_topic = agentscope.message.Msg("user", "Best practices for monorepo Claude Code usage", role="user")
final = sequential_pipeline([researcher, writer, editor], brief_topic)
print(final.content)

sequential_pipeline passes each agent's output to the next as input. AgentScope also has MsgHub (broadcast pattern), forlooppipeline (iterate), and a graph-based workflow if you need branching logic.

Verify it worked

Two quick checks:

  1. Run the single-agent script. You should see reasoning + tool calls printed, ending in a final answer that references the tool result.
  2. Open AgentScope Studio. If you installed the full extras, run as_studio in a terminal and open http://localhost:5000. You'll see every agent run, every tool call, latency per step, and token usage. This is the production debugging surface most agent code is missing.
bash
as_studio

If both work, you're set up.

Where this breaks

  • Token bills. ReAct loops can run away, set max_iters to a sane ceiling (8-12) and watch your token usage in Studio.
  • Tool function failures. If a tool returns garbage, the agent will reason itself in circles. Always return ServiceExecStatus.ERROR with a clear message, the agent uses it to course-correct.
  • Model drift between providers. A prompt that runs clean on Claude may loop on a smaller open model. Keep the model config in one place so you can swap and re-run benchmarks.
  • Memory unbounded growth. TemporaryMemory keeps everything until cleared. For long-running agents, set max_size or rotate.

What to try next

Want this built for you instead?

Let's talk about your AI + SEO stack

If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.

Let's Talk
Questions from readers

Frequently asked

Is AgentScope a replacement for LangGraph or AgentSDK?

It's a different bet. AgentScope ships more out of the box (Studio dashboard, deployment templates, evaluator), where LangGraph stays minimal and explicit. If you want a framework that gets you to production fast, AgentScope. If you want to compose every primitive yourself, LangGraph.

Does AgentScope work with Claude?

Yes. The model_type 'anthropic_chat' uses Anthropic's API directly. Set ANTHROPIC_API_KEY and pass model_name 'claude-sonnet-4-5-20250929' (or whatever model you're using).

Can I run AgentScope on Kubernetes?

That's one of the design goals. The framework includes deployment templates for local, serverless, and K8s. The agent code itself doesn't change between targets — you swap the runtime config.

How much does it cost to run a small AgentScope deployment?

Compute is whatever your host charges. The real cost is LLM tokens — every agent step is an API call. Set max_iters tight (8-12) and use prompt caching where the system prompt is stable.

Should I use AgentScope for a single-agent app?

Probably overkill. AgentScope earns its complexity once you have 3+ agents collaborating. For a single agent, the Anthropic SDK plus a 100-line ReAct loop is enough.

GUIDED IMPLEMENTATION

Want help running this in your business?

The guide above is the playbook. If you'd rather have someone walk it through with you (or just build the thing), book a 30-min scoping call. We'll map your stack, name the realistic timeline, and tell you straight if it's a fit.