---
name: unstuck
description: Structured debug when you've been chasing a bug for 20+ minutes without progress. Forces a hypothesis tree — ranked, with cheapest-test-first ordering — instead of more spelunking. Output is a plan, not a fix. Use when the failure mode is "I keep reading code and getting nowhere."
trigger: /unstuck
---

# /unstuck

When you've been chasing a bug for 20+ minutes and you're no closer, the failure mode is almost never "I haven't read enough code." It's that you started spelunking before you had a hypothesis tree. /unstuck forces the structure: state the symptom, list ranked hypotheses, run the cheapest test first.

The skill is read-only — it produces a plan, not a fix. The fix happens after, when you know which hypothesis was right.

## Usage

`/unstuck <one-line description of what's wrong>`

Examples:
- `/unstuck Instagram pipeline isn't draining manually`
- `/unstuck Prompt outputs are sometimes blank`
- `/unstuck Build passes locally, fails on Railway`

## What it's for

You've been hunting a bug on FUEL or EAA for >20 minutes. You've tried a few things. You're now reading the same files over and over. This is the universal sign you don't have a hypothesis tree — you have intuitions and you're testing them in random order.

The skill replaces "keep reading code" with: stop, structure, test cheapest first. Most bugs collapse to one of three causes once you actually rank them.

## What You Must Do When Invoked

### Step 1 — Capture the symptom

In one sentence, state what's actually wrong. Be specific: "the publish cron is running but no rows transition from PENDING to COMPLETED" beats "publishing is broken."

If the user only gave a vague description, ask one tight clarifying question. Just one — don't loop on "could you tell me more."

### Step 2 — Capture what's been tried

Pull from the recent conversation context. List 2–5 things already attempted with their outcomes. The point is to NOT re-suggest what failed.

If nothing's been tried yet (skill invoked early), say so explicitly.

### Step 3 — Build the hypothesis tree

List 3–5 hypotheses for what's actually wrong. For each:
- **Hypothesis** — one sentence
- **Why plausible** — what evidence supports it (file:line if possible)
- **Cheapest test** — fastest way to confirm or deny (DB query, log line, single command)

Rank by `(likelihood × cheapness of test)`. The first thing you test is the one that's both probably right AND cheap to check.

If you can't generate 3 hypotheses, you don't understand the system well enough — say so honestly. Don't pad the list.

### Step 4 — Output the plan

```markdown
# /unstuck — <one-line symptom>

**Already tried:** <2–5 bullets, each with outcome>

## Hypothesis tree

1. **<hypothesis>** — *<likelihood>* — <why plausible>
   Test: <cheapest check, with file:line or query>

2. **<hypothesis>** — *<likelihood>* — <why plausible>
   Test: <...>

(3–5 total)

## Run order
1. Test hypothesis #<N> first — cheapest of the high-likelihood ones.
2. If it fails, hypothesis #<M> next.
3. If two tests fail, stop and zoom out — the framing is probably wrong.

## What "zoom out" looks like
<1–2 sentences. What would suggest the bug is in a different system entirely?>
```

### Step 5 — Stop

Don't run the tests yet. Output the plan and stop. Jake decides which to start with — usually #1, sometimes he has context that re-ranks.

If two tests fail in the next session, treat the framing as suspect. Look in a different system layer (deploy environment, model version, DB replica) — the bug is probably not where you've been looking.

## Calibration for Jake's projects

- **EAA bugs:** usually queue/cron/race conditions, occasionally Railway deploy environment, rarely Next.js framework. Rank accordingly.
- **FUEL bugs:** usually prompt-output behavior (non-deterministic), occasionally model-version differences, rarely true bugs in the prompt structure itself. For prompt-output bugs, switch to /prompt-fix instead.
- **For both:** when "works locally, fails in prod" is the symptom, the answer is almost always env vars, schema drift, or build artifacts. Test that family first.

## What to avoid

- **No deep dive into hypothesis #1 before listing the others.** The point is the ranking, not the first idea you had.
- **No "could be many things" cop-out.** Force a ranking. A confidence-30% top pick beats a flat list.
- **No running tests inside this skill.** Output the plan, stop. Tests run in normal mode.
- **No padding.** 3 hypotheses is fine. Don't write 5 when you have 3.

## Example output (abbreviated)

```markdown
# /unstuck — Manual force-drain doesn't pick up one stuck PENDING blog

**Already tried:**
- Re-deployed Railway, no change
- Looked at runner.ts, claim logic looks right
- Confirmed blog status is PENDING in DB

## Hypothesis tree

1. **attempts is at MAX, claim filter excludes it** — *high* — Stale-claim recovery may have flipped status to PENDING without resetting attempts; claim filter is `attempts < MAX`.
   Test: `SELECT status, attempts FROM "QueuedPost" WHERE id = '<id>'`

2. **scheduledFor is in the future** — *medium* — Claim filter `scheduledFor <= NOW()` would skip it.
   Test: same query, check scheduledFor.

3. **Different table than UI shows** — *low* — Multiple queue tables, UI hits one but cron hits another.
   Test: confirm table name in both runner.ts and UI.

## Run order
1. Test #1 first. If attempts=3, that's it — stale recovery left it at MAX.
2. If not, test #2.
3. If both fail, framing is wrong — look at row visibility (different DB, replica lag, wrong table).

## What "zoom out" looks like
If two tests fail, the bug is probably not in claim logic at all — it's in row visibility or environment.
```

That's the shape. Hypothesis-tree first, evidence-grounded, cheapest test first.
