Back to guides

How Do I Write the Canonical Claude Tool-Calling Loop in Python?

Jake McCluskeyIntermediate40 min read
How Do I Write the Canonical Claude Tool-Calling Loop in Python?

If you understand one Claude pattern in Python, make it the tool-calling loop. Every agent, every workflow, every "Claude uses my API" integration is some variation of this same 40-line loop. Once you can write it from memory, you can build anything. Here's the canonical implementation with all the edge cases that trip people up the first time.

Why this matters

Tool use in Claude is conceptually simple — Claude responds with a tool_use block instead of text, you execute the tool, you send the result back, Claude continues. But the first time you write the loop, three things bite:

  • Constructing the next messages array correctly (assistant content from the response, user content as tool results).
  • Handling multiple tool calls in a single response.
  • Knowing when to stop.

Get those right, and the loop runs forever. Get them wrong, and you get confusing 400 errors or infinite loops. This guide is the 40 lines and the gotchas.

Before you start

You need:

  • Python 3.10+.
  • An Anthropic API key (export ANTHROPIC_API_KEY=sk-ant-...).
  • 10 minutes. The loop itself is short; the gotchas are where the time goes.

Step 1: Install the SDK

bash
pip install anthropic

Step 2: Define one tool

Start with one. Resist the urge to define five until the loop is solid with one.

python
from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    },
]

def execute_tool(name: str, tool_input: dict) -> str:
    if name == "get_weather":
        # In real code, call an API. For the example, return canned data.
        return f"The weather in {tool_input['city']} is 72F and sunny."
    raise ValueError(f"Unknown tool: {name}")

Step 3: Write the loop

python
def run(prompt: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": prompt}]

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        # Append assistant response to history regardless of stop reason
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Extract final text and return
            final = "".join(
                block.text for block in response.content if block.type == "text"
            )
            return final

        if response.stop_reason == "tool_use":
            # Execute every tool_use block in the response
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    try:
                        result = execute_tool(block.name, block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result,
                        })
                    except Exception as e:
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": f"Error: {e}",
                            "is_error": True,
                        })

            messages.append({"role": "user", "content": tool_results})
            continue

        # Any other stop reason (max_tokens, stop_sequence) — bail
        return f"Unexpected stop: {response.stop_reason}"

    raise RuntimeError(f"Exceeded {max_iterations} iterations")

That's the complete loop. Let's walk the gotchas.

Step 4: Gotcha — pass response.content verbatim to messages

The biggest mistake people make is reconstructing the assistant message from response.content[0].text or similar. Don't. Pass response.content directly. It already has the mixed text and tool_use blocks in the correct shape, and any reconstruction will lose the tool_use IDs you need to reference in the next turn.

Correct:

python
messages.append({"role": "assistant", "content": response.content})

Wrong:

python
# This loses tool_use blocks entirely and breaks the next turn.
text = "".join(b.text for b in response.content if b.type == "text")
messages.append({"role": "assistant", "content": text})

Step 5: Gotcha — tool results go as a user message with a list of tool_result blocks

The API wants tool results as a single user message with one or more tool_result blocks, not as multiple messages. Even if Claude called five tools in one response, you respond with one message that has five tool_result entries.

python
messages.append({
    "role": "user",
    "content": [
        {"type": "tool_result", "tool_use_id": "toolu_1", "content": "..."},
        {"type": "tool_result", "tool_use_id": "toolu_2", "content": "..."},
    ],
})

Every tool_use_id from the assistant turn must have a matching tool_result. Miss one, and the next API call errors with "unexpected tool_use_id."

Step 6: Gotcha — tool_result content can be a string or a list

content in a tool_result can be a string (most common) or a list of content blocks (useful for returning images or multimodal output). Start with strings; move to the list form only if you have a concrete reason.

For structured data, stringify it:

python
import json
tool_results.append({
    "type": "tool_result",
    "tool_use_id": block.id,
    "content": json.dumps(result),  # Claude parses the JSON in-conversation
})

Step 7: Gotcha — stopping conditions

The loop stops when response.stop_reason == "end_turn". Other stop reasons exist — max_tokens (you hit the output limit), stop_sequence (rare), or errors. Handle them explicitly:

python
if response.stop_reason == "max_tokens":
    # Output was cut off. Either raise max_tokens and retry, or accept partial.
    ...

Always cap total iterations. An agent can loop indefinitely if its prompt is underspecified. 10-20 iterations is plenty for most tasks.

Step 8: Run it

python
if __name__ == "__main__":
    out = run("What's the weather in Austin, and should I bring a jacket?")
    print(out)

Claude should: call get_weather(city="Austin"), receive the result, and respond with both the weather and the jacket recommendation. If that works end to end, the loop is correct.

Verify it worked

1. One round trip works. The weather prompt above should produce exactly one tool call and one final answer. If it loops multiple times, your tool result content might be too vague.

2. Multiple tool calls in one turn work. Ask "What's the weather in Austin and Seattle?" Claude often calls get_weather twice in one response. Your loop must execute both and return both results in a single user message.

3. Errors are handled gracefully. Throw an exception in execute_tool. The is_error: true flag should let Claude recover — either retry differently or report the error in the final answer. If the loop crashes, wrap execute_tool properly.

Where this breaks

  • Mismatched tool_use_ids. Every tool_use in a turn needs a matching tool_result in the next. Missing one errors immediately. Iterate all tool_use blocks before moving on.
  • Forgetting to stringify tool output. Raw Python objects in content fields produce confusing errors. Always str() or json.dumps().
  • Infinite loops on vague prompts. "Research everything about X" with web-search tools can loop endlessly. Always cap iterations and include stop criteria in the system prompt ("stop when you have 3-5 sources").
  • Exceeding the model's context on long chains. Every iteration appends to messages. After 15 tool calls with verbose results, you can blow through context. Strategies: summarize older turns into a single assistant message ("Earlier in the conversation, Claude searched for X and found Y"), or use prompt caching on the tool-definition prefix.
  • Streaming mismatch. The example above uses non-streaming. Streaming tool use is supported but requires a different API — client.messages.stream() with event handlers. Start non-streaming; migrate only when you need it.

What to try next

Want this built for you instead?

Let's talk about your AI + SEO stack

If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.

Let's Talk
Questions from readers

Frequently asked

Why can't I just concatenate the assistant's text to messages?

The response content is a list of blocks — some text, some tool_use. Concatenating just the text drops the tool_use blocks, which break the tool_use_id <-> tool_result linkage Claude needs in the next turn. Always pass response.content as-is to messages.append.

What if Claude calls multiple tools at once?

Execute all of them, then append a single user message containing one tool_result block per tool_use_id. Missing any tool_result errors the next request with 'unexpected tool_use_id.' All-or-nothing: every tool_use in a turn gets a corresponding result.

How do I handle tool errors?

Return the error as a tool_result with is_error: true. Claude sees the failure and usually either retries differently or gracefully reports the failure to the user. Swallowing the exception silently is the worst option.

Should I always use max_iterations?

Yes. A loose prompt plus a bad tool can produce an infinite loop in production. 10-20 iterations is a safe cap; if your task genuinely needs more, reconsider whether it should be multiple tasks.

Can I stream tool use instead of waiting for the full response?

Yes, with client.messages.stream(), but the event model is different — you handle tool_use and text events as they arrive. Start with non-streaming while you learn the loop; move to streaming when you need the responsive UX.