fix(skills): plans reference the spec instead of restating it#1715
Open
arittr wants to merge 3 commits into
Open
fix(skills): plans reference the spec instead of restating it#1715arittr wants to merge 3 commits into
arittr wants to merge 3 commits into
Conversation
…#1) writing-plans told agents to "document everything they need to know" assuming zero context — every agent in the 2026-06-09 six-agent quorum sweep obeyed and restated the entire spec inline in the plan (cost-spec-plan-duplication failed 5/5 completed agents; pi's plan was 683 lines of duplicated spec). - writing-plans: state the division of labor — spec owns WHAT/WHY, plan owns HOW; cite the spec by path/section, never restate it. "Zero context" means mechanically executable steps, not duplication. Add a **Spec:** line to the plan header template. - brainstorming: close the path loophole the re-run exposed — claude shortened docs/superpowers/specs/ to docs/specs/ in 2/2 runs; both path mentions now explicitly forbid the shortening. TDD evidence (quorum): - RED: batch-20260609T023452Z-68aa et al — 5/5 agents fail - GREEN: cost-spec-plan-duplication-claude-20260609T234142Z-9625 pass (plan: "this plan does not restate them" + spec cited by path; both docs in docs/superpowers/) - Canary: triggering-writing-plans-claude pass (skill still fires) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 10, 2026
5 tasks
… reference rule
Adversarial review findings (C1, C2, C3, C5, A8, F3):
- "never restate" did not cover paraphrase/summary — the actual failure
mode in the RED evidence; now "never restate, paraphrase, or summarize".
- The No Placeholders intra-plan repetition mandate gave a symmetric
argument for re-inlining the spec; the rule now draws the line:
repetition WITHIN the plan is required, copying FROM the spec is not.
- Drift argument was invertible ("snapshot to avoid drift"); now states
snapshots hide drift.
- **Spec:** header gets a no-spec branch (state requirements once in
the header, not per task) instead of inviting "no spec, rule is moot".
- Brainstorming path bullet: an existing differently-named docs dir is
not a "user preference" override.
- Execution Handoff now notes review fanout scales (forward-ref to
SDD's Proportionality rule) instead of promising unconditional
two-stage review.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Eval-caught regression: the no-spec branch added to the **Spec:**
header gave the agent a sanctioned path to skip the spec doc entirely
("avoiding duplication by skipping the spec" —
cost-spec-plan-duplication-claude-20260610T213934Z-8e5b, fail). The
branch is now scoped: if brainstorming happened the spec exists and
must be cited; "none — requirements:" applies only when requirements
arrived conversationally and no spec doc was ever produced. The
reference-discipline paragraph states the same rule up front.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Collaborator
Author
|
This change is part of the following stack: Change managed by git-spice. |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Who is submitting this PR? (required)
claude-fable-5[1m])devcheckout); quorum eval lab (superpowers-evals) as the testing apparatus; unrelated local ops plugins (decision-log, episodic-memory, superpowers-chrome, primeradiant-ops)What problem are you trying to solve?
In a six-agent behavioral eval sweep (quorum/Gauntlet lab, 2026-06-09; agents: claude, codex, antigravity, kimi, opencode/gpt-5.5, pi), the
cost-spec-plan-duplicationscenario — naive user brainstorms a small feature, then says "Looks good, please write the plan" — failed every agent that completed it (5/5). Each plan restated the entire spec inline instead of referencing it:The agents were obeying the skill: writing-plans' overview says "assuming the engineer has zero context… Document everything they need to know." Self-contained-by-design plans necessarily duplicate the spec — doubling document bytes/tokens and letting the two artifacts drift apart. The eval and the skill text were in direct contradiction; no agent could satisfy both.
A second defect surfaced while verifying the fix: claude shortened the spec path
docs/superpowers/specs/to plaindocs/specs/in 2/2 runs (the path sits mid-checklist and loses to the model'sdocs/specsprior).What does this PR change?
States the division of labor in
writing-plans: the spec owns WHAT/WHY, the plan owns HOW — cite the spec by path, never restate it ("zero context" now explicitly means mechanically executable steps, not duplication), and adds a**Spec:**line to the plan header template. Inbrainstorming, closes the observed path loophole: bothdocs/superpowers/specs/mentions now explicitly forbid shortening todocs/specs/.Is this change appropriate for the core library?
Yes — general-purpose planning discipline. It applies to any project that produces a spec and a plan, is not tied to any domain, tool, or third-party service, and changes no workflow structure — only the redundancy rule between two artifacts the core skills already produce.
What alternatives did you consider?
Does this PR contain multiple unrelated changes?
Two skills, one dependency: the brainstorming path fix was exposed by this change's own eval cycle (first post-edit run passed the reference criterion but failed the path criterion, 2/2 reproducible), and the same scenario's deterministic checks gate both (
file-exists docs/superpowers/specs/*.md+ the reference criterion). The GREEN evidence below requires the pair.Existing PRs
Environment tested
New harness support (required if this PR adds a new harness)
N/A — no harness changes.
Evaluation
cost-spec-plan-duplicationpost-edit: reference criterion flipped to pass, exposed the path defect; (2) after the path fix: full pass (runcost-spec-plan-duplication-claude-20260609T234142Z-9625); (3) regression canarytriggering-writing-plans: pass (skill still auto-fires;Skill(superpowers:writing-plans)in the session log).docs/superpowers/locations, all deterministic post-checks green.Rigor
superpowers:writing-skillsand completed adversarial pressure testing (paste results below)RED→GREEN→REFACTOR per writing-skills: RED = 5-agent baseline failures with verbatim rationalizations (above) against the skill text unchanged since May 4; GREEN = minimal edit, scenario passes; REFACTOR = the path loophole found in GREEN-attempt-1 closed with explicit counters ("exactly this path — not
docs/specs/") and re-verified. The pressure scenario is adversarial by construction: a scripted naive user who never names skills and never reminds the agent of spec content, graded by an independent QA agent plus deterministic filesystem checks.Human review
Hardening round (adversarial review fleet + eval re-verification)
Three parallel reviewers (red-team, cross-corpus consistency, evidence verification) audited the stack. Findings addressed here by two follow-up commits:
**Spec:**header gained a no-spec branch — which the eval immediately caught being abused: the agent skipped writing the spec entirely "to avoid duplication" (cost-spec-plan-duplication-claude-20260610T213934Z-8e5b, FAIL — a reviewer-recommended fix that misfired, caught only because every change re-runs the eval). Re-scoped: if brainstorming happened the spec exists and is cited; "none — requirements:" applies only when requirements arrived conversationally and no spec doc was ever produced. Re-run: clean pass (cost-spec-plan-duplication-claude-20260610T221053Z-bf8e— "The plan contains implementation HOW details… rather than restating the WHAT/WHY").Final verification battery on the assembled 4-PR stack: 5/5 pass (cost-checkbox/claude, cost-spec-plan/claude, cost-trivial-task-review-fanout/opencode, sdd-rejects-extra-features/claude, triggering-writing-plans/claude).