--- name: unstuck description: Structured debug when you've been chasing a bug for 20+ minutes without progress. Forces a hypothesis tree — ranked, with cheapest-test-first ordering — instead of more spelunking. Output is a plan, not a fix. Use when the failure mode is "I keep reading code and getting nowhere." trigger: /unstuck --- # /unstuck When you've been chasing a bug for 20+ minutes and you're no closer, the failure mode is almost never "I haven't read enough code." It's that you started spelunking before you had a hypothesis tree. /unstuck forces the structure: state the symptom, list ranked hypotheses, run the cheapest test first. The skill is read-only — it produces a plan, not a fix. The fix happens after, when you know which hypothesis was right. ## Usage `/unstuck ` Examples: - `/unstuck Instagram pipeline isn't draining manually` - `/unstuck Prompt outputs are sometimes blank` - `/unstuck Build passes locally, fails on Railway` ## What it's for You've been hunting a bug on FUEL or EAA for >20 minutes. You've tried a few things. You're now reading the same files over and over. This is the universal sign you don't have a hypothesis tree — you have intuitions and you're testing them in random order. The skill replaces "keep reading code" with: stop, structure, test cheapest first. Most bugs collapse to one of three causes once you actually rank them. ## What You Must Do When Invoked ### Step 1 — Capture the symptom In one sentence, state what's actually wrong. Be specific: "the publish cron is running but no rows transition from PENDING to COMPLETED" beats "publishing is broken." If the user only gave a vague description, ask one tight clarifying question. Just one — don't loop on "could you tell me more." ### Step 2 — Capture what's been tried Pull from the recent conversation context. List 2–5 things already attempted with their outcomes. The point is to NOT re-suggest what failed. If nothing's been tried yet (skill invoked early), say so explicitly. ### Step 3 — Build the hypothesis tree List 3–5 hypotheses for what's actually wrong. For each: - **Hypothesis** — one sentence - **Why plausible** — what evidence supports it (file:line if possible) - **Cheapest test** — fastest way to confirm or deny (DB query, log line, single command) Rank by `(likelihood × cheapness of test)`. The first thing you test is the one that's both probably right AND cheap to check. If you can't generate 3 hypotheses, you don't understand the system well enough — say so honestly. Don't pad the list. ### Step 4 — Output the plan ```markdown # /unstuck — **Already tried:** <2–5 bullets, each with outcome> ## Hypothesis tree 1. **** — ** — Test: 2. **** — ** — Test: <...> (3–5 total) ## Run order 1. Test hypothesis # first — cheapest of the high-likelihood ones. 2. If it fails, hypothesis # next. 3. If two tests fail, stop and zoom out — the framing is probably wrong. ## What "zoom out" looks like <1–2 sentences. What would suggest the bug is in a different system entirely?> ``` ### Step 5 — Stop Don't run the tests yet. Output the plan and stop. Jake decides which to start with — usually #1, sometimes he has context that re-ranks. If two tests fail in the next session, treat the framing as suspect. Look in a different system layer (deploy environment, model version, DB replica) — the bug is probably not where you've been looking. ## Calibration for Jake's projects - **EAA bugs:** usually queue/cron/race conditions, occasionally Railway deploy environment, rarely Next.js framework. Rank accordingly. - **FUEL bugs:** usually prompt-output behavior (non-deterministic), occasionally model-version differences, rarely true bugs in the prompt structure itself. For prompt-output bugs, switch to /prompt-fix instead. - **For both:** when "works locally, fails in prod" is the symptom, the answer is almost always env vars, schema drift, or build artifacts. Test that family first. ## What to avoid - **No deep dive into hypothesis #1 before listing the others.** The point is the ranking, not the first idea you had. - **No "could be many things" cop-out.** Force a ranking. A confidence-30% top pick beats a flat list. - **No running tests inside this skill.** Output the plan, stop. Tests run in normal mode. - **No padding.** 3 hypotheses is fine. Don't write 5 when you have 3. ## Example output (abbreviated) ```markdown # /unstuck — Manual force-drain doesn't pick up one stuck PENDING blog **Already tried:** - Re-deployed Railway, no change - Looked at runner.ts, claim logic looks right - Confirmed blog status is PENDING in DB ## Hypothesis tree 1. **attempts is at MAX, claim filter excludes it** — *high* — Stale-claim recovery may have flipped status to PENDING without resetting attempts; claim filter is `attempts < MAX`. Test: `SELECT status, attempts FROM "QueuedPost" WHERE id = ''` 2. **scheduledFor is in the future** — *medium* — Claim filter `scheduledFor <= NOW()` would skip it. Test: same query, check scheduledFor. 3. **Different table than UI shows** — *low* — Multiple queue tables, UI hits one but cron hits another. Test: confirm table name in both runner.ts and UI. ## Run order 1. Test #1 first. If attempts=3, that's it — stale recovery left it at MAX. 2. If not, test #2. 3. If both fail, framing is wrong — look at row visibility (different DB, replica lag, wrong table). ## What "zoom out" looks like If two tests fail, the bug is probably not in claim logic at all — it's in row visibility or environment. ``` That's the shape. Hypothesis-tree first, evidence-grounded, cheapest test first.