fix(skills): description-level exceptions are authoritative in the routing rule by arittr · Pull Request #1732 · obra/superpowers

arittr · 2026-06-10T22:31:44Z

Stacked on #1718 (brainstorming gate exception) — top of the SUP-333 stack (#1715 → #1716 → #1718 → this). Targets dev per the template requirement.

Who is submitting this PR? (required)

Field	Value
Your model + version	Claude Fable 5 (`claude-fable-5[1m]`)
Harness + version	Claude Code 2.1.169
All plugins installed	superpowers (this repo, `dev` checkout); quorum eval lab (`superpowers-evals`) as the testing apparatus; unrelated local ops plugins (decision-log, episodic-memory, superpowers-chrome, primeradiant-ops)
Human partner who reviewed this diff	Drew Ritter (@drewritter)

What problem are you trying to solve?

An adversarial review fleet (three parallel reviewers: red-team, cross-corpus consistency, evidence verification) audited the SUP-333 stack and found that using-superpowers' routing rule contradicts #1718's description-level exception in both directions:

Compliant agents re-impose the cost failure: "even a 1% chance a skill might apply → ABSOLUTELY MUST invoke" mandates invoking brainstorming for any trivial request (there is always ≥1% doubt), and the invocation itself is the measured cost event. "If an invoked skill turns out to be wrong, you don't need to use it" even endorses invoke-then-exit — the exact behavior the eval fails.
Cost-optimizing agents get a free skip: "the skill is overkill" is a tabled, forbidden rationalization, but "the skill's own description says it doesn't apply" was unaddressed — leaving any skip arguably sanctioned with no counter.
"Instructions say WHAT, not HOW. 'Add X' … doesn't mean skip workflows" names the exception's canonical case ("add a basic checkbox") as a violation.

What does this PR change?

Adds one paragraph to The Rule: a documented exception in a skill's own description is authoritative — not invoking is compliance, not rationalization; any doubt about the exception's conditions means invoke; only the skill's description can define such an exception (agents cannot infer one). The "Add X doesn't mean skip workflows" line gains the matching qualifier.

Is this change appropriate for the core library?

Yes — it defines how the routing layer treats description-level scoping for ALL skills, which #1718 introduces and future skills may use. Without it the bootstrap text and skill descriptions give contradictory instructions.

What alternatives did you consider?

Leave it unreconciled — rejected: both failure directions are live (the eval demonstrated the compliant-agent direction; the red-team demonstrated the free-skip direction).
Weaken the 1% rule itself — rejected: the rule is correct for the 99% case; the fix scopes it rather than weakens it ("any doubt means invoke" is preserved verbatim inside the new paragraph).
Put the reconciliation in brainstorming instead — rejected: the contradiction lives in the routing layer's text; future description-level exceptions would hit it again.

Does this PR contain multiple unrelated changes?

No — one rule, two coordinated touches in one file (the rule paragraph and the User Instructions line that contradicted it).

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs: fix(skills): plans reference the spec instead of restating it — end to end #1715, fix(skills): SDD review fanout scales with the change #1716, fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718 (the stack this completes); none found touching the routing rule's exception semantics.

Environment tested

Harness (e.g. Claude Code, Cursor)	Harness version	Model	Model version/ID
Claude Code (agent under test)	2.1.169	Claude Opus	claude-opus-4-8
opencode (agent under test)	1.16.2	GPT	openai/gpt-5.5

New harness support (required if this PR adds a new harness)

N/A — no harness changes.

Evaluation

Initial prompt: quorum eval scenarios with scripted naive users (no skills named by the driver).
Eval sessions after the change: the full 5-run verification battery ran on the assembled stack including this change: cost-checkbox-over-trigger/claude pass (the exception fires through the routing layer — no brainstorming invocation), triggering-writing-plans/claude pass (the 1% rule still triggers skills that DO apply), cost-spec-plan-duplication/claude pass (brainstorming still gates real features), cost-trivial-task-review-fanout/opencode pass, sdd-rejects-extra-features/claude pass (run IDs in the stack PRs).
Before/after: before, the routing text and the description exception contradicted; after, both eval directions hold simultaneously — skip fires only for the documented exception, triggers fire everywhere else.

Rigor

If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
This change was tested adversarially, not just on the happy path
I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

The Red Flags table is untouched; the new paragraph closes the "description exempts me" gap the table could not cover, with the doubt-means-invoke backstop preserved. Adversarial findings (D1/D2) and both exploit directions are documented above; the 5-run battery is the post-change evidence.

Human review

A human has reviewed the COMPLETE proposed diff before submission

Round 3: staff-review refinements + evidence

Refinements in this PR's follow-up commit: the skill_flow digraph now routes through a "Skill's own description exempts this request?" diamond (no/any-doubt → invoke) — this stack's own evidence says agents follow flowcharts literally, and the chart previously contradicted the rule; the <EXTREMELY-IMPORTANT> block gains a one-line deferral to The Rule (previously it read unconditional, contradicting the rule in the same always-loaded file); writing-skills now distinguishes negative triggering conditions (scope — allowed, and required at the description per this rule) from workflow summaries (still forbidden), so a future editor applying its checklist does not strip the exception and silently regress the cost evals.

Final-text evidence: the exception routes correctly where supported — cost-checkbox-over-trigger skip: claude 3/3, codex ✓, antigravity ✓ (kimi does not pick up description exceptions; unchanged from baseline). The 1% rule still triggers skills that apply: triggering-writing-plans/claude 3/3 pass; ×codex fail — byte-for-byte its pre-existing documented signature (loads sibling skills, skips the mandated one; predates this stack, tracked separately).

Merge guidance: merge together with #1718 (see its note).

Round 4: stated-scan mandate

The exception-skip path now requires a visible artifact: one line naming the exception and the empty tripwire scan, written before the first action ("If you did not write the scan line, you did not scan — invoke the skill instead"). The flowchart's exempt edge routes through the scan statement; the red-flags table counters "too trivial to scan". Rationale: the measured boundary leaks were silent non-consultation — externalizing the scan forces it to happen, at a cost of one sentence on the trivial path. Measured effect recorded on #1718 (boundary cells 1/3 → 2/3 with the trivial path intact at 2/2).

…#1) writing-plans told agents to "document everything they need to know" assuming zero context — every agent in the 2026-06-09 six-agent quorum sweep obeyed and restated the entire spec inline in the plan (cost-spec-plan-duplication failed 5/5 completed agents; pi's plan was 683 lines of duplicated spec). - writing-plans: state the division of labor — spec owns WHAT/WHY, plan owns HOW; cite the spec by path/section, never restate it. "Zero context" means mechanically executable steps, not duplication. Add a **Spec:** line to the plan header template. - brainstorming: close the path loophole the re-run exposed — claude shortened docs/superpowers/specs/ to docs/specs/ in 2/2 runs; both path mentions now explicitly forbid the shortening. TDD evidence (quorum): - RED: batch-20260609T023452Z-68aa et al — 5/5 agents fail - GREEN: cost-spec-plan-duplication-claude-20260609T234142Z-9625 pass (plan: "this plan does not restate them" + spec cited by path; both docs in docs/superpowers/) - Canary: triggering-writing-plans-claude pass (skill still fires) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… reference rule Adversarial review findings (C1, C2, C3, C5, A8, F3): - "never restate" did not cover paraphrase/summary — the actual failure mode in the RED evidence; now "never restate, paraphrase, or summarize". - The No Placeholders intra-plan repetition mandate gave a symmetric argument for re-inlining the spec; the rule now draws the line: repetition WITHIN the plan is required, copying FROM the spec is not. - Drift argument was invertible ("snapshot to avoid drift"); now states snapshots hide drift. - **Spec:** header gets a no-spec branch (state requirements once in the header, not per task) instead of inviting "no spec, rule is moot". - Brainstorming path bullet: an existing differently-named docs dir is not a "user preference" override. - Execution Handoff now notes review fanout scales (forward-ref to SDD's Proportionality rule) instead of promising unconditional two-stage review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Eval-caught regression: the no-spec branch added to the **Spec:** header gave the agent a sanctioned path to skip the spec doc entirely ("avoiding duplication by skipping the spec" — cost-spec-plan-duplication-claude-20260610T213934Z-8e5b, fail). The branch is now scoped: if brainstorming happened the spec exists and must be cited; "none — requirements:" applies only when requirements arrived conversationally and no spec doc was ever produced. The reference-discipline paragraph states the same rule up front. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

arittr · 2026-06-10T22:31:45Z

This change is part of the following stack:

fix(skills): plans reference the spec instead of restating it — end to end #1715
- fix(skills): SDD review fanout scales with the change #1716
  - fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718
    - fix(skills): description-level exceptions are authoritative in the routing rule #1732 ◀

_{Change managed by git-spice.}

…ting-plans spec gap Staff-review findings (4-reviewer panel): - Reference paragraph rewritten 170→123 words preserving every behavioral condition (paraphrase/summarize coverage, no-skip guard, WHAT-WHY/HOW split, No Placeholders boundary, drift counter, zero-context rescope); fixes the "(brainstorming did)" syntax. - **Spec:** header bracket: cut the never-skip sermon duplicated from the Overview (same loaded document); the conditional none-branch stays. - executing-plans Step 1 now reads the spec the plan cites — plans are no longer self-contained, and the non-subagent execution path was never told (the eval only exercised the SDD consumer). - writing-plans plan-location preference line gets the same existing-dir-is-not-a-preference guard as the spec path. - brainstorming: deduplicate the docs/specs/ prohibition (step 6 parenthetical stays; After-the-Design bullet was the second statement in one file). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

subagent-driven-development mandated implementer + two-stage review + final reviewer unconditionally — agy and opencode each dispatched 4 subagents for a one-line console.log in the 2026-06-09 quorum sweep (cost-trivial-task-review-fanout), and the agents that passed did so only by disobeying the skill. - Proportionality rule: when the entire plan is one trivial, fully-specified mechanical change, implement directly, verify, commit — no review fanout. "When in doubt, it is not trivial." Within a multi-task plan the full pipeline still applies to every task regardless of size. - Flowchart gets the trivial-exit diamond (the failing agents follow the flowchart literally; prose alone would not redirect them). - Red Flags "never skip reviews" amended to reference the exception so the skill does not contradict itself. TDD evidence (quorum): - RED: agy 025324Z + opencode batches — 4 dispatches for 1 line - GREEN: cost-trivial-task-review-fanout-opencode-20260610T002518Z-f3f5 pass — 0 dispatches, $0.04, change landed on main checkout - Canary: sdd-rejects-extra-features-claude-20260610T002901Z-458a pass — multi-task plan still runs implementer + two-stage review per task (tool-called Agent ✓, spec reviewer as YAGNI gate after each task) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…es cited specs Adversarial + consistency review findings (B1, B2, B3, B5, F1): - Red Flags line read literally licensed skipping reviews on trivial tasks INSIDE multi-task plans; now states the only exception is a whole-plan trivial change and never-skip within multi-task plans. - "a one-line edit" example blessed one-line behavioral changes (e.g. adding "|| user.isOwner"); dropped. Trivial is now defined as a property of the diff (no logic/control-flow/behavior change), not of the plan's self-description. The "nothing for review to catch" justification proved too much; replaced with the cost argument. - "verify it" was undefined on the trivial path; now concrete (run tests/command, confirm output, verification-before-completion). - Flowchart diamond now matches the prose: "fully-specified" + "any doubt = no" (the failing agents execute the flowchart literally). - New Spec Context section + prompt-template updates: the controller reads the spec cited in the plan header and pastes cited sections into implementer/spec-reviewer prompts; the spec reviewer's diff-only rule gets a spec-document exception. Without this, the stack's reference-not-restate rule starves the SDD pipeline of requirements. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ualify constant bumps Staff-review findings (4-reviewer panel): - CONTRADICTION FIX: Spec Context said "Subagents never read the spec file themselves" while spec-reviewer-prompt grants exactly that access. Now: implementers never read it; the spec reviewer may, at the cited path. - "a constant bump" was an unqualified trivial example — a one-line BCRYPT_ROUNDS or session-TTL change is a security-posture change; now qualified "with no security or behavioral consequences" (matching brainstorming's config-change qualifier). The diff-property definition adds "nothing security-relevant". - Proportionality rewritten 146→~115 words (house style; one statement of the multi-task containment instead of two). - Red Flags Never-line trimmed 33→14 words (pointer to Proportionality instead of third in-file restatement). - Prompt-template rationale tails cut (the controller just read Spec Context; subagents need the pasted text, not the policy rationale). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…gn (SUP-333 #3) The HARD-GATE ("EVERY project regardless of perceived simplicity") plus the anti-pattern list naming "a config change" made design+approval mandatory even for fully-specified trivial asks — all 6 agents in the 2026-06-09 quorum sweep ran a multi-option design flow for "a basic checkbox, nothing fancy" (cost-checkbox-over-trigger failed 6/6). Two layers, because routing happens before skill content is read (GREEN attempt 1 proved it: the agent invoked the skill on the description's mandate and only then saw the in-skill exception, and the invocation itself is the cost event): - description: carve-out visible at skill-selection time — zero open design decisions, fully specified trivial change → implement directly without invoking. - HARD-GATE: matching exception with objective re-gating tripwires (new file/dependency, schema/API/data question, >1 plausible interpretation, user frames it as a feature/project), and the anti-pattern section now distinguishes "seems simple" (a rationalization when decisions exist) from "contains every decision" (the exception). "A config change" moves from the all-of-them list to the exception's example. The repo's acceptance test ("Let's make a react todo list" must auto-trigger brainstorming) is unaffected: a react todo list leaves many decisions open and todo lists remain in the anti-pattern list. TDD evidence (quorum): - RED: cost-checkbox-over-trigger fails 6/6 agents (batch 2026-06-09); GREEN attempt 1 with in-skill exception only: still fail (invoked via description, then asked a clarifying question) - GREEN: cost-checkbox-over-trigger-claude-20260610T004320Z-a30e pass — no brainstorming invocation, agent cited the exception verbatim, checkbox landed in 31s - Canary: cost-spec-plan-duplication-claude-20260610T004506Z-22ea pass — a real feature still triggers the full brainstorm→spec→plan flow (and the stacked writing-plans reference discipline holds) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…+ rationalization counters Adversarial review findings (A1-A7, D3): - BLOCKER A1: the re-gating tripwires lived only in the HARD-GATE, but the skip decision happens at the description (our own GREEN-attempt-1 evidence). The description now carries the tripwires: adds a file/dependency, touches schema/API/persisted data, deletes or disables anything, alters behavior/security posture, >1 plausible reading. - A2: "a schema/API/data question" was defeated by "the user answered the question"; now touch-based ("even if the user stated the desired outcome"). - A3: destructive changes and behavior/security-visible changes had no tripwire (pure removals were structurally invisible); both added. "a literal config value change" example now qualified ("with no security or behavioral consequences"). - A4: the checkbox example no longer teaches hedge-phrase = fully specified ("where the context leaves nothing to choose"). - A5: "EVERY project regardless of perceived simplicity" now ends "with exactly one exception below" instead of contradicting it. - A6: rationalization table added (codebase-pattern, infer-the-obvious, hedge-phrase, asking-wastes-time). - A7: anti-pattern opener is a claim again ("Anything with open decisions goes through this process"). - D3: exception states TDD and verification-before-completion still apply, so the fast path does not read as zero-oversight. Description: 689 chars (limit 1024), YAML-validated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tion Staff-review findings (4-reviewer panel): - The tripwire list existed twice in this file (description + HARD-GATE) and the copies had already drifted after one editing round — the framing tripwire and the security qualifier lived only in the HARD-GATE, which the skip decision never reads (our own GREEN-attempt-1 evidence). The description is now the single authoritative list; the HARD-GATE exception defers to it. - Security-posture fix: the "beyond the literally stated value" escape no longer applies to security — touching auth, sessions, permissions, CORS, or crypto re-gates EVEN when the value is exactly as stated (the harm of "set CORS to *" IS the stated value). User-visible behavior keeps the beyond-the-stated-change scope (a requested checkbox is the stated change; that is the point of the exception). - The framing tripwire moves into the description where it can act. - Anti-pattern final clause cut (was the 4th in-file statement of the exception's condition). - Description: 886/1024 chars, YAML-validated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Eval-caught leak (cost-remove-export-boundary-claude, first run): the agent reasoned "the user already decided the deletion, so no design decision is open" and silently removed a working feature — reading the tripwires as indicators of open decisions rather than unconditional re-gates. The deletion tripwire now carries the same rider as the security one ("even when the deletion is exactly what was asked"), and the rationalization table counters the exact quoted escape. Description: 950/1024 chars. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e stated Measured boundary leak (cost-session-timeout-boundary: claude surfaced the security tradeoff 1/3; cost-remove-export-boundary 1/2 post-rider): in failing runs the agent acted without consulting the exception at all. Two prompt defects fixed: - Ordering: the description granted "implement it directly" BEFORE the tripwire list — a skimming agent got the permission and stopped reading. The tripwires now come first and the permission is earned ("Only when NO tripwire hits..."). - Observability: the skip was silent. It now requires a stated one-line scan before implementing, which forces the scan to actually happen (the routing-layer mandate lands in the companion using-superpowers commit). "timeouts" added to the security examples — the literal failing case. Description: 971/1024 chars, YAML-validated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…uting rule (SUP-333 #4) Adversarial review findings D1/D2: the 1%-chance invocation rule and the "Add X doesn't mean skip workflows" line contradicted the new brainstorming description exception in both directions — a compliant agent re-imposes the cost failure (invocation itself is the measured cost event), while a cost-optimizing agent could treat any skip as sanctioned. The routing skill now states: a documented exception in a skill's own description defines that skill's scope (compliance, not rationalization); any doubt about the exception's conditions means invoke; and only the description can define one — agents cannot infer exceptions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ception rule; writing-skills carve-out Staff-review findings (4-reviewer panel): - The skill_flow digraph still routed "yes, even 1%" straight to invoke with no exception branch — and this stack's own evidence says agents follow flowcharts literally. The flow now passes through "Skill's own description exempts this request?" with no/any-doubt → invoke. - The <EXTREMELY-IMPORTANT> block ("you cannot rationalize your way out of this") read unconditional; one parenthetical defers to The Rule's single carve-out without weakening the block. - Trimmed the redundant "the description defines the skill's scope" clause from The Rule paragraph. - writing-skills' "descriptions must not carry process" doctrine would have had a future editor strip the brainstorming exception and silently regress the cost evals; it now distinguishes negative triggering conditions (scope — allowed and, per the routing rule, required at the description) from workflow summaries (still forbidden). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Measured boundary leaks (session-timeout 1/3, remove-export 1/2 on claude) traced to silent skips: the agent acted without consulting the exception, and nothing made the consultation observable. The skip now requires a stated one-line scan before the first action ("Skipping brainstorming per its exception: no security/deletion/schema/new-file tripwires; outcome fully specified") — externalizing the scan forces it to happen. Flowchart routes the exempt path through the scan statement; red-flags table counters "too trivial to scan". The trivial path stays fast: the ceremony is one sentence, not a design flow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

arittr · 2026-06-11T06:49:15Z

Consolidated into #1718 per independent-mergeability restructure: the routing-rule change and the brainstorming description exception are one behavioral mechanism (the stated-scan protocol spans both files), and shipping them separately left an active contradiction window between merges. #1718 now carries both, plus the full eval evidence.

arittr and others added 3 commits June 9, 2026 16:52

arittr mentioned this pull request Jun 10, 2026

fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718

Draft

5 tasks

arittr and others added 8 commits June 10, 2026 18:24

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from aff9195 to 87ddfac Compare June 11, 2026 02:16

arittr force-pushed the drew/sup-333-4-description-exceptions-authoritative branch from 9d0ac38 to 36e289e Compare June 11, 2026 02:16

This was referenced Jun 11, 2026

fix(skills): SDD review fanout scales with the change #1716

Draft

fix(skills): plans reference the spec instead of restating it — end to end #1715

Open

arittr and others added 4 commits June 10, 2026 22:49

arittr force-pushed the drew/sup-333-4-description-exceptions-authoritative branch from 36e289e to a96fb01 Compare June 11, 2026 06:05

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from 6535b2d to 7857e05 Compare June 11, 2026 06:49

arittr closed this Jun 11, 2026

arittr deleted the drew/sup-333-4-description-exceptions-authoritative branch June 11, 2026 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(skills): description-level exceptions are authoritative in the routing rule#1732

fix(skills): description-level exceptions are authoritative in the routing rule#1732
arittr wants to merge 15 commits into
drew/sup-333-3-brainstorming-triviality-gatefrom
drew/sup-333-4-description-exceptions-authoritative

arittr commented Jun 10, 2026 •

edited

Loading

Uh oh!

arittr commented Jun 10, 2026 •

edited

Loading

Uh oh!

arittr commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arittr commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Who is submitting this PR? (required)

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

New harness support (required if this PR adds a new harness)

Evaluation

Rigor

Human review

Round 3: staff-review refinements + evidence

Round 4: stated-scan mandate

Uh oh!

arittr commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arittr commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arittr commented Jun 10, 2026 •

edited

Loading

arittr commented Jun 10, 2026 •

edited

Loading