Back to guides

How Do I Give My Claude Agent Persistent Memory Across Sessions?

Jake McCluskeyAdvanced45 min read
How Do I Give My Claude Agent Persistent Memory Across Sessions?

Agents are great for one-shot tasks. They're limited for anything that needs to remember what happened yesterday. A research agent that forgets what it already searched, a support agent that asks the same customer the same questions every conversation — those are agents without persistent memory. Here's how to give a Claude-powered agent a real memory store it reads from at the start of every turn and writes to when it learns something new.

Why this matters

"Conversation memory" in a chat app is just appending messages to a list. That breaks down the moment you want memory that:

  • Survives a process restart.
  • Is queryable ("what do I know about user X?").
  • Can be updated without replaying the whole conversation.
  • Scales beyond one conversation's worth of context.

A real memory store — a database or vector index the agent reads and writes to — solves all of these. The agent queries it at the start of every task, writes updates when it learns something new, and operates as if it has a brain that persists between runs.

Before you start

You need:

  • A working agent built with the Anthropic SDK (Python or Node). The pattern from How Do I Build a Research Pipeline with the Agent SDK is a good starting point.
  • A Postgres or SQLite database — persistent memory doesn't need to be fancy. Start with a single memories table.
  • 45 minutes, because tool design is where the time goes.

Step 1: Design the memory schema

What does "memory" mean for your agent? Three common shapes:

  • Key-value facts: "user preferred pronouns: they/them" — good for small, structured info.
  • Free-text notes with tags: "reviewed quarterly report on 2026-03-15; finance flagged ambiguity in deferred revenue recognition" — good for growing context.
  • Vector-searchable memory: embeddings of past interactions retrieved by semantic similarity — good for large history searches.

Start with free-text notes. It's the most flexible and the least over-engineered. Schema:

sql
CREATE TABLE memories (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_id TEXT NOT NULL,            -- which agent/thread this belongs to
  tag TEXT NOT NULL,                  -- e.g. "user_preference", "project_state"
  content TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON memories(agent_id, tag);

One table. Enough for a surprising amount of production use.

Step 2: Expose memory as agent tools

The agent interacts with memory through tools, not by the runtime magically injecting context. This gives the agent agency over what to remember and lets you audit the reasoning.

python
tools = [
    {
        "name": "recall_memory",
        "description": "Retrieve stored memories. Use at the start of a task to load context about the user/project/domain. Filter by tag if you know what you're looking for.",
        "input_schema": {
            "type": "object",
            "properties": {
                "tag": {"type": "string", "description": "Optional tag filter"},
                "limit": {"type": "integer", "default": 20},
            },
        },
    },
    {
        "name": "write_memory",
        "description": "Save a new memory. Use sparingly — only for facts that will matter across future sessions. Never write transient info (current time, this session's mood).",
        "input_schema": {
            "type": "object",
            "properties": {
                "tag": {"type": "string"},
                "content": {"type": "string"},
            },
            "required": ["tag", "content"],
        },
    },
    {
        "name": "update_memory",
        "description": "Replace the content of an existing memory by id. Use when you learn something that contradicts a prior memory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "id": {"type": "string"},
                "content": {"type": "string"},
            },
            "required": ["id", "content"],
        },
    },
]

Three operations: read, write, update. Notice there's no delete — memories age out via retention policy, not agent choice. That's a safety default; loosen it if your use case requires it.

Step 3: Implement the tool handlers

python
import uuid
from datetime import datetime

def execute_tool(name, tool_input, agent_id):
    if name == "recall_memory":
        query = "SELECT id, tag, content, updated_at FROM memories WHERE agent_id = %s"
        params = [agent_id]
        if tool_input.get("tag"):
            query += " AND tag = %s"
            params.append(tool_input["tag"])
        query += " ORDER BY updated_at DESC LIMIT %s"
        params.append(tool_input.get("limit", 20))
        rows = db.fetch_all(query, params)
        return [
            {"id": str(r["id"]), "tag": r["tag"], "content": r["content"],
             "updated": r["updated_at"].isoformat()}
            for r in rows
        ]

    if name == "write_memory":
        memory_id = uuid.uuid4()
        db.execute(
            "INSERT INTO memories (id, agent_id, tag, content) VALUES (%s, %s, %s, %s)",
            (memory_id, agent_id, tool_input["tag"], tool_input["content"]),
        )
        return {"id": str(memory_id), "status": "stored"}

    if name == "update_memory":
        db.execute(
            "UPDATE memories SET content = %s, updated_at = NOW() WHERE id = %s AND agent_id = %s",
            (tool_input["content"], tool_input["id"], agent_id),
        )
        return {"id": tool_input["id"], "status": "updated"}

Two guardrails worth enforcing in the tool handlers, not just the prompt:

  • agent_id scoping — memories from one agent's context never leak to another's. Every query filters by agent_id. This is structurally the same concern as tenant isolation in multi-tenant SaaS.
  • Rate limits per memory write — the agent will sometimes panic-write duplicate notes. Enforce a minimum interval or a content-hash dedup.

Step 4: Write a system prompt that uses memory deliberately

The tools are useless if the agent doesn't know when to call them. Your system prompt:

text
You are a persistent agent with memory that spans sessions.

Memory usage rules:
- At the START of every new task, call recall_memory() with no tag to
  load general context. Then call recall_memory(tag=<relevant>) if
  there's a specific domain.
- Write a new memory ONLY for facts that will matter in future sessions.
  Good: "The user prefers plain markdown over rich formatting."
  Bad: "The user just said hello."
- When you learn something that contradicts a prior memory, use
  update_memory, don't write a duplicate.
- Memory is not a notebook. Keep it terse, factual, and tagged.

Available tags in use so far: user_preference, project_state,
past_decision, contact_info.

The "good/bad examples" is what separates agents that write useful memories from agents that fill the store with noise.

Step 5: Run a multi-session test

Start the agent. Have a short conversation where it learns one concrete thing — "I prefer plain markdown." Watch it write a memory.

Kill the process. Restart. Ask a question. The agent should start with a recall_memory call, see the preference, and format its response accordingly.

If it doesn't recall, either:

  • The system prompt isn't strong enough on "call recall_memory first."
  • The agent_id isn't consistent across sessions (very common bug — make sure you're scoping to a stable identifier like user ID, not session ID).
  • The tool description is too vague; Claude isn't sure when to use it.

Iterate on the prompt and tool descriptions until the agent reliably reads memory at the start and writes selectively throughout.

Step 6: Add retention and compaction

After a month, the memory store grows. Without maintenance, recall_memory returns irrelevant old notes and Claude's reasoning degrades.

Two strategies:

  • Aging. Delete memories older than N days unless they've been updated or accessed recently. Add a last_accessed_at column updated on recall.
  • Compaction. Periodically have Claude itself read all memories for an agent and produce a condensed summary. Replace the verbose notes with the summary. Done weekly or monthly.

Compaction is the elegant version. Set it up as a scheduled job using the pattern from How Do I Schedule Claude Code to Run Overnight Jobs?.

Verify it worked

1. Memory survives a restart. Kill the process after writing a memory. Restart. Confirm the agent recalls it.

2. Agent-ID scoping is enforced. Create two agent IDs. Write a memory under one. Query under the other. The memory must not appear. This is a privacy-critical check.

3. The agent writes sparingly. Run ten tasks. Count the memories written. If it's 50+, the system prompt isn't doing its job and the store will rot fast.

Where this breaks

  • Memory contamination across users. If your agent_id is derived from anything a user can spoof (URL param, client-sent header), they can access another user's memories. Derive agent_id server-side from authenticated identity only.
  • The agent trusting memory over reality. A memory that says "user prefers X" can be six months stale. Include updated_at in the recall result so the agent can weigh freshness. For high-stakes decisions, have the agent confirm the memory is still accurate.
  • PII in memory. Anything the agent writes is persisted. If users drop sensitive data in chat ("my SSN is..."), the agent might write it to memory. Filter at write time — reject tool inputs matching patterns for SSN, credit card, etc.
  • Memory bloat with redundant writes. A chatty agent will write "user seems happy" and "user is happy today" and "today user appeared pleased" as three separate memories. Content-hash dedup in the write handler.
  • Cascading hallucinations. Agent writes an incorrect memory in session 1, reads it in session 2, doubles down in session 3. Mitigation: make memory writes visible to the user (or at least loggable), and have an occasional "verify memory against reality" task that flags stale or wrong entries.

What to try next

Want this built for you instead?

Let's talk about your AI + SEO stack

If you'd rather skip the how-to and have it shipped for you, that's what I do. Start a conversation and we'll figure out the fastest path to results.

Let's Talk
Questions from readers

Frequently asked

Should I use a vector database for agent memory?

Not initially. A simple Postgres table with tag filters handles 90% of real use cases and is 10x easier to debug. Add vector search when you have a specific need — semantic recall across a large memory store, or fuzzy matching user queries to stored facts. Don't start there.

How do I prevent memories from leaking across users?

Scope every memory to an agent_id derived from authenticated identity server-side. Never let a client pass its own agent_id. Every recall_memory query filters by agent_id in the WHERE clause; missing that is how cross-user leaks happen.

When should the agent write vs update a memory?

Write new memories for distinct new facts. Update existing memories when you learn something that contradicts or refines a prior one. Duplicate writes bloat the store and degrade recall quality. The system prompt should make this explicit with concrete examples.

How do I keep the memory store from growing forever?

Two strategies: aging (delete unaccessed entries after N days) and compaction (monthly scheduled job where Claude summarizes all memories for an agent and replaces the verbose ones with the summary). Compaction keeps the useful context while trimming noise.

What happens if a memory turns out to be wrong?

Bake in a 'verify memory' task that occasionally asks the agent to check key memories against current reality (recent messages, external data sources). Flag inconsistencies for human review. Without this, an incorrect memory can propagate across sessions indefinitely — a bug agents are uniquely good at amplifying.