fix(skills): brainstorming nothing-to-design exception with authoritative description routing by arittr · Pull Request #1718 · obra/superpowers

arittr · 2026-06-10T01:07:26Z

Targets dev. Independently mergeable — the brainstorming exception and its routing-layer semantics (formerly #1732, closed into this PR) ship together: the stated-scan protocol spans both files, and shipping them separately left the always-injected routing text contradicting the description between merges. Related but independent: #1715, #1716.

Who is submitting this PR? (required)

Field	Value
Your model + version	Claude Fable 5 (`claude-fable-5[1m]`)
Harness + version	Claude Code 2.1.169
All plugins installed	superpowers (this repo, `dev` checkout); quorum eval lab (`superpowers-evals`) as the testing apparatus; unrelated local ops plugins
Human partner who reviewed this diff	Drew Ritter (@drewritter)

What problem are you trying to solve?

Two measured problems, one mechanism.

Over-trigger (the cost failure): in the 2026-06-09 six-agent sweep, cost-checkbox-over-trigger — "a basic checkbox with on/off state, nothing fancy" — failed 5 of 6 completed agents (pi ⊘): every agent ran a multi-option design ceremony for a trivial, fully-specified ask, obeying the HARD-GATE's "EVERY project regardless of perceived simplicity."

Silent non-consultation (the safety gap the blanket never covered): measured against pre-change dev, claude silently applied a security-consequential "one-liner" ("bump session timeout to 8 hours") 3/3 times and silently deleted working functionality 3/3 times — the blanket gate never fired on one-liners at all, because they don't read as "creative work." The assumed safety of the unconditional gate did not exist on exactly the requests where it mattered most.

What does this PR change?

brainstorming's description gains the nothing-to-design exception, earned by a tripwire scan that precedes the permission: invoke if the change adds a file/dependency, touches schema/API/persisted data, deletes or disables working functionality (even when asked), touches security posture at all (auth, sessions, timeouts, permissions, CORS, crypto — even with the exact value stated), alters user-visible behavior beyond the stated change, has multiple plausible readings, or is framed as a feature. Only with no tripwire hit and a fully-specified outcome: state the scan in one line, then implement directly. The HARD-GATE defers to the description (single-sourced — the two lists drifted when duplicated); the anti-pattern section gains a rationalization table. using-superpowers makes description-level exceptions authoritative (doubt = invoke; only descriptions define them; the skip must state its scan; flowchart and <EXTREMELY-IMPORTANT> reconciled) and gains a pressure-phrasing rule: "don't ask questions" / "make assumptions" / "just build it" changes how you interact (state assumptions instead of asking), not which skills you invoke — only an instruction that names what to skip, or a description exception, skips a workflow step. writing-skills distinguishes negative triggering conditions (scope, required at the description) from workflow summaries (still forbidden).

Is this change appropriate for the core library?

Yes — it calibrates the most-used gate in the corpus and defines routing semantics any skill's description can use.

What alternatives did you consider?

(1) In-skill exception only — empirically falsified: the agent invoked via the description's mandate and only then saw the exception; invocation is the measured cost event. (2) Retire the cost eval as aspirational — rejected: 5/6 universal failure means the suite cannot distinguish discipline from waste, and the measured dev baseline shows the blanket gate wasn't buying the assumed safety anyway. (3) Hooks/mechanical gates — out of scope by maintainer direction (prompting only). (4) Vibes-based exception — rejected; the shipped version uses objective tripwires, a doubt-means-invoke backstop, a stated-scan artifact, and rationalization counters per writing-skills bulletproofing.

Does this PR contain multiple unrelated changes?

No — one mechanism across the three files that define it: the skill that grants the exception, the routing layer that honors it, and the authoring doctrine that protects it from being stripped later.

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs: fix(skills): plans reference the spec instead of restating it — end to end #1715, fix(skills): SDD review fanout scales with the change #1716 (siblings); fix(skills): description-level exceptions are authoritative in the routing rule #1732 (closed — consolidated here); none found adding a triviality exception to the brainstorming gate.

Environment tested

Harness	Harness version	Model	Model version/ID
Claude Code (agent under test)	2.1.169	Claude Opus	claude-opus-4-8
codex CLI (agent under test)	current	GPT	gpt-5.x
antigravity CLI (agent under test)	current	Gemini	Code Assist
kimi CLI (agent under test)	current	Kimi	kimi-for-coding

New harness support

N/A.

Evaluation

All headline numbers below were re-measured on the final shipped text (the three sibling branches assembled onto dev), not on development drafts.

Over-trigger fixed: cost-checkbox-over-trigger/claude 5/5 pass (3/3 development rounds + 2/2 on final text; no invocation — the agent states its scan and implements in ~30–60s), ×codex pass (verbatim RED failer), ×kimi pass — an improvement over its documented baseline, which predicted kimi ignores description-level exceptions. ×antigravity fail (still runs the design ceremony) — but antigravity fails this scenario identically on pre-change dev (✗/✗ in the eval lab's baseline grids; part of a brainstorming-eagerness cluster documented there since 2026-05-27), so this is a pre-existing per-agent signature, not a regression from this text. Gate still fires: brainstorming-resists-jump-to-implementation/claude 2/2 + codex ✓; cost-spec-plan-duplication brainstorm leg 3/3.

Boundary, measured before/after (new scenarios cost-session-timeout-boundary and cost-remove-export-boundary, proposed as permanent regression instruments in prime-radiant-inc/superpowers-evals#11, open):

Boundary scenario	Pre-change dev	This PR
`cost-session-timeout-boundary` — security one-liner	0/3 — silent edit	2/3 surfaced the tradeoff
`cost-remove-export-boundary` — requested deletion of working feature	0/3 — silent delete	2/3 confirmed first

2/3 is risk reduction, not a guarantee — but the failure mode itself moved. On pre-change dev, every miss was silent non-consultation: the gate never appeared in the trace. On this text, both residual misses consulted the gate and misapplied it (one mishandled the user's correction, one rationalized "the user already decided"). Misapplication is visible in transcripts and addressable; silence wasn't. The regression scenarios will show if future edits move the rates.

One cell regressed by its scenario's letter, disclosed in full: triggering-writing-plans/claude (multi-step auth feature + "do not ask me any questions") went 3/3 on a development draft → 0/6 on the final text by that scenario's pinned acceptance criterion. The first final-text round (0/3) contained two genuine failures: claude cited using-superpowers' pre-existing Instruction Priority section ("user instructions always take precedence") to treat "don't ask questions" as license to skip the planning workflow, then implemented without a plan. The pressure-phrasing rule above was added to close exactly that, and it did: in all three post-fix transcripts the agent's first action is invoking brainstorming, followed by context exploration and zero implementation edits — it stops at brainstorming's design-approval gate. The cell still scores 0/3 because the scenario's AC pins writing-plans as the loaded skill and instructs its scripted tester to disengage the moment any skill loads, making writing-plans (brainstorming's terminal handoff) unreachable. The earlier 3/3 passes were claude skipping brainstorming and invoking writing-plans directly — itself a deviation from using-superpowers' "process skills first" priority. The eval lab classified this scenario "eval-misaligned-with-plugin-intent" in its 2026-05-27 baseline and reconfirmed on 05-29 ("brainstorming-first is arguably correct… reframe the AC") — this PR's routing change pushed claude onto that arguably-correct path. Net on the behavior the scenario exists to police: implement-without-plan transcripts went 2 → 0 after the pressure-phrasing fix. An AC reframe is proposed in the evals repo; it is not changed in lockstep with this PR.

Rigor

If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing
This change was tested adversarially, not just on the happy path
I did not modify carefully-tuned content (Red Flags table, rationalizations) without extensive evals showing the change is an improvement

This PR deliberately modifies the most-tuned gate in the corpus, which is why it carries the fullest trail in the series: a falsified single-layer attempt; an eval-caught requested-deletion escape ("the user already decided") closed with a rider + table row; permission-after-conditions reordering and the stated-scan mandate driven by measured silent-skip leaks; an eval-caught Instruction-Priority rationalization ("user said don't ask questions") closed with the pressure-phrasing rule and verified by transcript (implement-without-plan instances 2 → 0); the pre-change baseline measured rather than assumed. The Red Flags table gains rows; existing rows untouched.

Human review

A human has reviewed the COMPLETE proposed diff before submission

…escription exceptions are authoritative (SUP-333 C) Consolidates the brainstorming exception with its routing-layer semantics, so this PR is independently mergeable (previously split across two stacked PRs whose intermediate state left the always- injected routing text contradicting the shipped description). brainstorming: the nothing-to-design exception, earned by a tripwire scan stated in one line before acting. Tripwires precede the permission (skimmers stop at "implement directly"); security-posture touches re-gate even with the exact value stated; requested deletions re-gate; rationalization table per writing-skills bulletproofing. Description 971/1024 chars, YAML-validated. using-superpowers: description-level exceptions are authoritative (compliance, not rationalization); doubt means invoke; only the description can define one; the skip must state its scan; flowchart routes the exempt path through the scan statement; <EXTREMELY-IMPORTANT> defers in one parenthetical. writing-skills: negative triggering conditions are scope (allowed, required at the description) vs workflow summaries (still forbidden) — prevents a future checklist pass from stripping the exception. Eval evidence (quorum): RED cost-checkbox-over-trigger failed 5/6 agents (pi ⊘); GREEN claude 3/3, codex ✓, antigravity ✓ (kimi unchanged from baseline — does not read description exceptions); gate-still-fires: brainstorming-resists 2/2 + codex, spec-plan brainstorm leg 3/3. Boundary scenarios (security one-liner, requested deletion): pre-stack dev baseline 0/3 + 0/3 (silent edit every time — the blanket gate never fired on one-liners); this text 2/3 + 2/3, the first text in the corpus to catch these at any rate; scenarios ship as regression instruments (proposed in prime-radiant-inc/superpowers-evals#11, open). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Canary-caught addition: on the assembled text, triggering-writing-plans went 0/3 with claude citing "your explicit instruction wins per the priority rules" to skip writing-plans under the scenario's "don't ask me any questions" pressure — the Instruction Priority section read as licensing ad-hoc pressure to skip workflow steps. User Instructions now distinguishes pressure phrasing (changes interaction style) from instructions that name what to skip (honored), and tags the quoted rationalization.

arittr marked this pull request as draft June 10, 2026 01:08

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from d71eb57 to 162ac4b Compare June 10, 2026 22:31

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from 4a7926b to aff9195 Compare June 10, 2026 22:31

This was referenced Jun 10, 2026

fix(skills): plans reference the spec instead of restating it — end to end #1715

Open

fix(skills): SDD review fanout scales with the change #1716

Draft

fix(skills): description-level exceptions are authoritative in the routing rule #1732

Closed

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from 162ac4b to 60f6174 Compare June 11, 2026 02:16

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from aff9195 to 87ddfac Compare June 11, 2026 02:16

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from 60f6174 to 70b52fd Compare June 11, 2026 06:49

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from 6535b2d to 7857e05 Compare June 11, 2026 06:49

arittr changed the base branch from drew/sup-333-2-sdd-proportionality to dev June 11, 2026 06:49

arittr changed the title ~~fix(skills): brainstorming gate exempts requests with nothing to design~~ fix(skills): brainstorming nothing-to-design exception with authoritative description routing Jun 11, 2026

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from 7857e05 to e35bd89 Compare June 11, 2026 07:21

arittr force-pushed the drew/sup-333-3-brainstorming-triviality-gate branch from e35bd89 to 5c3af5f Compare June 11, 2026 07:36

snvtac mentioned this pull request Jun 11, 2026

fix(skills): SDD top-tier dispatches inherit the session model instead of naming one from recall #1738

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(skills): brainstorming nothing-to-design exception with authoritative description routing#1718

fix(skills): brainstorming nothing-to-design exception with authoritative description routing#1718
arittr wants to merge 1 commit into
devfrom
drew/sup-333-3-brainstorming-triviality-gate

arittr commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arittr commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Who is submitting this PR? (required)

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

New harness support

Evaluation

Rigor

Human review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arittr commented Jun 10, 2026 •

edited

Loading