---
name: prompt-fix
description: Structured prompt debugging via single-variable ablation. Replaces vibe-iteration ("change phrasing, re-run, eyeball, repeat") with: name the failure mode precisely, define success, propose 3 single-variable ablations, run each, compare. Built for FUEL but works for any prompt tuning session.
trigger: /prompt-fix
---

# /prompt-fix

When a prompt isn't producing what you want, the failure mode is rewriting the whole thing and re-running. You change three things at once, the output gets slightly better or slightly worse, you can't tell which change helped, you change three more things. /prompt-fix forces single-variable ablations: change one thing per run, log results, compare.

The skill is built for FUEL — your prompt repository — but works for any prompt-tuning session, including the EAA blog generator and skill SKILL.md tuning.

## Usage

`/prompt-fix <prompt-name-or-path>` — point at the prompt that's misbehaving.

Or just `/prompt-fix` and describe the issue conversationally — the skill will pull the prompt from context.

## What it's for

FUEL is a prompt repo. You debug prompts often. The classic failure mode is:
- Output is wrong in some way
- You rewrite phrasing
- Re-run, output is different (LLMs aren't deterministic)
- Can't tell if your change actually helped or you got lucky
- Try again with more changes
- Now you've changed 6 things and the prompt from 10 minutes ago is gone

/prompt-fix replaces vibe-iteration with: name the failure mode precisely, define success precisely, propose 3 single-variable ablations, run each, compare.

## What You Must Do When Invoked

### Step 1 — Name the failure mode precisely

Vague: "the prompt isn't very good."
Precise: "output sometimes returns markdown when it should return JSON" / "reasoning chain skips step 3 about half the time" / "tone is too formal for the audience."

Force a precise statement. If the user gave a vague one, ask one tight question to sharpen it.

Common failure modes:
- **Format** — output structure wrong (JSON missing fields, markdown when plain text wanted)
- **Reasoning** — model skips steps or reaches wrong conclusion
- **Consistency** — output varies wildly across runs
- **Tone / register** — too formal, too casual, wrong audience
- **Hallucination** — model invents facts
- **Length** — too short or too long
- **Refusal** — model refuses or hedges when it shouldn't

### Step 2 — Define success concretely

What would the right output look like? Either:
- A specific example output (best — paste one)
- A checklist of properties (acceptable when no example exists)

If you can't define success, you can't measure ablations. Don't skip this.

### Step 3 — Hypothesize the cause

1–3 hypotheses about which part of the prompt is responsible:
- "The system prompt is too vague about format"
- "The few-shot example contradicts the instruction"
- "Temperature too high for this kind of task"
- "Instruction order is wrong — task description after examples"

### Step 4 — Propose 3 single-variable ablations

Each ablation changes ONE thing relative to the baseline.

```
Baseline: <current prompt as-is>

Ablation A: baseline + <one specific change>
Ablation B: baseline + <one different change>
Ablation C: baseline + <one different change>
```

If you find yourself wanting to change two things in one ablation, split it into two.

### Step 5 — Run and compare

Run baseline + 3 ablations. Ideally 3 runs each (LLM non-determinism), 1 run each is acceptable for a quick scan.

Output a comparison table:

```
            Format   Reasoning   Tone   Notes
Baseline      ✗        ✓          ✓      misses JSON close brace
Ablation A    ✓        ✓          ✓      fixes format, reasoning preserved
Ablation B    ✓        ✓          ✗      tone shifted formal
Ablation C    ✗        ✓          ✓      change didn't help
```

Pick the winner. If multiple ablations help different things, the next step is composing them — but do that as a NEW iteration, not in this skill.

### Step 6 — Save the result

If the user wants the winning prompt saved, write it as a new version (don't overwrite the working one). Note the change in 1 line:

```
v2: tightened format instruction in system prompt (Ablation A)
```

## Calibration for FUEL

- **Versioning matters.** Don't overwrite the working version. Save winning ablations as new files (`prompt.v2.md`, `prompt.v3.md`).
- **LLM non-determinism is the enemy of measurement.** Run each ablation at least twice if possible. One-shot results lie.
- **Watch the model version.** A prompt that worked on Sonnet 3.7 may behave differently on Sonnet 4.5. Note the model in any saved version.
- **Cross-platform check:** if a prompt is meant to work in both Claude Code and the API, ablations should run in both — system prompt handling differs.

## What to avoid

- **No multi-variable ablations.** "Try changing the system prompt AND the examples" is the failure mode this skill exists to prevent.
- **No skipping the success definition.** "I'll know it when I see it" is how you end up in vibe-mode.
- **No running ablations before listing all 3.** Otherwise you'll cut the list short after the first one works and miss a better one.
- **No claiming success on n=1.** LLMs vary run-to-run. If you can only run once, say so honestly.

## Example output (abbreviated)

```markdown
# /prompt-fix — research-helper.md

## Failure mode
Output sometimes includes markdown code fences around the JSON, breaking the downstream parser.

## Success looks like
Pure JSON, no fences, no preamble:
{"angles": [{"title": "...", "thesis": "..."}]}

## Hypotheses
1. System prompt doesn't explicitly forbid code fences.
2. Few-shot example uses fences in its display format.

## Ablations
- Baseline: as-is
- A: add line "Output raw JSON only. Do not wrap in code fences." to system prompt
- B: rewrite few-shot to show output without fences
- C: lower temperature 0.7 → 0.3

## Results (3 runs each)
            Fences   Format   Reasoning
Baseline    2/3 yes   ✓        ✓
A           0/3 yes   ✓        ✓     ← winner
B           1/3 yes   ✓        ✓
C           0/3 yes   ✓        ✗     reasoning got rigid

## Pick
Ablation A. Save as research-helper.v2.md.
```

That's the shape. Single-variable, measured, no vibes.
