---
name: post-mortem
description: Capture a recently-resolved incident or painful debug session as durable project memory. Writes a structured feedback file into the project's memory dir so future Claude sessions remember the lesson and don't repeat the mistake.
trigger: /post-mortem
---

# /post-mortem

Painful bugs and operational disasters carry the most learning per minute. Without a skill, those lessons evaporate within a week — you remember "something about the publish cron" but not what specifically broke or how the fix worked. Post-mortem captures the lesson while it's still vivid and writes it as project memory, so the next Claude session starts already knowing.

The output is not a doc you read once. It's a memory file (`feedback_<topic>.md`) that I, the next Claude, will load when you start a new session. The "When to remember this" section at the bottom is the trigger — that's how the lesson gets retrieved later.

## Usage

`/post-mortem <topic>` — short slug for the file (e.g. `publish-cron-race`, `stuck-pending-claims`)

Or just `/post-mortem` and I'll suggest a slug from recent context.

## What it's for

You hit incidents that took real time to debug. Recent examples on Elite AI Advantage alone:
- Publish cron racing manual drain attempts
- Bootstrap seed re-introducing deleted IG accounts on every deploy
- Stuck PENDING blogs invisible to the claim filter (attempts >= MAX bug)
- Force-drain UI not updating until Railway deploy caught up

Each of those is now a one-time fix in code. None of them have written memory — so if a similar bug pattern shows up next month, future-Claude has to rediscover the failure mode from scratch. Post-mortem fixes that.

## What You Must Do When Invoked

### Step 1 — Identify the project + incident

If a topic slug was provided, use it. If not, infer from recent context — the last ~10 turns of conversation usually point at one specific problem. Confirm with Jake in one short question if it's truly ambiguous (otherwise commit and write it).

Find the project memory dir:
- `pwd` to get project path
- Slugify path (`/` and ` ` → `-`)
- Target: `~/.claude/projects/<slug>/memory/feedback_<topic>.md`
- If memory dir doesn't exist, create it with `mkdir -p`

### Step 2 — Gather the timeline

Pull from:
- Recent conversation context (the most reliable source)
- Recent git log: `git log --since="3 days ago" --pretty=format:"%h %s (%cr)"`
- Recent commits' diffs for the actual fix: `git show <hash>` on the fix commit(s)

Identify:
- **Symptom** — what was visible to Jake
- **Root cause** — what was actually broken under the hood
- **Fix** — the specific code/config change, with file:line references
- **Why it happened** — the design pattern or assumption that allowed it

### Step 3 — Write the feedback file

Output structure (write this exact shape to the memory dir):

```markdown
# Feedback: <topic>

**Date:** <YYYY-MM-DD>
**Project:** <project name>
**Severity:** <minor / moderate / painful / disaster>

## What happened
<3–6 bullet timeline. Plain language. What Jake saw, in order.>

## Root cause
<1–3 sentences. The actual underlying issue. Not the symptom.>

## What we changed
<File-by-file list of the fix. Include file paths and (where stable) function names.
Example:
- `src/lib/instagram-pipeline/runner.ts` — split stale-claim recovery into two paths: `attempts >= MAX` → FAILED, `attempts < MAX` → PENDING
- `prisma/seed-instagram-accounts.ts` — added bootstrap-only guard checking `existingCount > 0`>

## Why it happened (deeper)
<1–2 paragraphs. The design pattern, the assumption, the edge case. This is the part future-Claude needs.>

## Prevention
<Concrete: what code pattern, what review check, what test would have caught this. If the answer is "nothing realistic," say so honestly.>

## When to remember this
<2–4 trigger conditions for future sessions. Format as "if you see X, recall this."
Example:
- If working on the InstagramPipeline runner or claim logic, remember stale-claim recovery must split on attempts.
- If touching `prisma/seed-*.ts`, check bootstrap guards exist or admin deletions get reversed every deploy.
- If a queue-style table has `attempts < MAX_ATTEMPTS` filters, audit whether stale rows can land at MAX without failing.>
```

### Step 4 — Update MEMORY.md index

If `~/.claude/projects/<slug>/memory/MEMORY.md` exists, append a one-line link:

```markdown
- [<topic>](feedback_<topic>.md) — <one-line summary>
```

Match the existing line style in MEMORY.md (don't introduce a new format).

### Step 5 — Confirm and stop

Print a 3-line confirmation:
- File written: `<path>`
- MEMORY.md updated: yes/no
- Next likely trigger: `<one of the "when to remember this" lines>`

Then stop. Don't propose more work.

## Calibration for Jake's projects

- **Tone:** factual, not apologetic. No "we should have caught this earlier" hand-wringing. State what happened and what changed.
- **Granularity:** if the fix was a one-line typo, the post-mortem is one paragraph. If the fix was a system redesign, the post-mortem can be a page. Match the scope.
- **Severity scale:**
  - `minor` — under 30 minutes, single file, no user impact
  - `moderate` — under 2 hours, multi-file or multi-stage thinking
  - `painful` — half a day plus, required design rethink
  - `disaster` — user-visible damage, lost work, late-night debugging

## What to avoid

- **No blame.** Code didn't "fail us," a design didn't "betray us." The pattern allowed the bug. State it neutrally.
- **No retroactive coaching.** Don't write "Jake should have…" — that's not what feedback files are for.
- **No copying full diffs.** Reference file:line, don't paste 200 lines of code into memory.
- **No vague triggers.** "If you're working on the pipeline…" is too broad. Be specific: "If editing `claimNextQueued` or any stale-recovery logic…"
- **No skipping the "When to remember" section.** That's the entire point. Without it, the file is a journal entry, not memory.

## Example output (abbreviated)

```markdown
# Feedback: stuck-pending-claims

**Date:** 2026-04-23
**Project:** Elite AI Advantage
**Severity:** painful

## What happened
- Admin reported one PENDING blog couldn't be drained.
- Manual force-drain didn't pick it up.
- Inspection showed status=PENDING but attempts=3 (MAX).

## Root cause
`claimNextQueued` increments attempts on claim. If the worker dies mid-write (Railway redeploy), stale recovery flips the row back to PENDING but leaves attempts=3. Subsequent claims filter on `attempts < MAX`, so the row is invisible forever.

## What we changed
- `src/lib/instagram-pipeline/runner.ts` — split stale recovery: `attempts >= MAX` → FAILED, `attempts < MAX` → PENDING. Plus a sweep for already-stuck PENDINGs at MAX.
- `forceDrain` now also runs the sweep for retroactive cleanup.

## Why it happened (deeper)
The claim filter assumed PENDING + attempts=MAX is impossible (since claim moves it to PROCESSING first). Mid-write death broke that assumption. Generic claim logic, generic stale recovery, but the two together produced an impossible state.

## Prevention
Any queue with `attempts < MAX` claim filters needs explicit handling for "stale row at MAX" — either fail it, or treat it as eligible. Adding this to the IG pipeline doesn't generalize; the rule belongs in the design notes for any future queue.

## When to remember this
- If editing `claimNextQueued` or any stale-recovery logic in instagram-pipeline, remember the two-path split.
- If building a new queue with attempt limits, design the stale-at-MAX state explicitly.
- If a PENDING blog won't drain, suspect attempts=MAX before suspecting the claim logic.
```

That's the shape. Vivid, specific, future-actionable.
