fix(skills): SDD review fanout scales with the change by arittr · Pull Request #1716 · obra/superpowers

arittr · 2026-06-10T00:40:04Z

Targets dev. Independently mergeable — self-contained: the Proportionality rule and every reference to it ship together. Related but independent: #1715 (reference discipline), #1718 (brainstorming exception).

Who is submitting this PR? (required)

Field	Value
Your model + version	Claude Fable 5 (`claude-fable-5[1m]`)
Harness + version	Claude Code 2.1.169
All plugins installed	superpowers (this repo, `dev` checkout); quorum eval lab (`superpowers-evals`) as the testing apparatus; unrelated local ops plugins
Human partner who reviewed this diff	Drew Ritter (@drewritter)

What problem are you trying to solve?

In the 2026-06-09 six-agent quorum sweep, cost-trivial-task-review-fanout — a scripted naive user asks the agent to execute a plan whose entire content is one one-line console.log insertion — showed subagent-driven-development's pipeline has no proportionality exit: antigravity "dispatched 4 subagents: Implementer, Spec Reviewer, Code Quality Reviewer, and Final Code Reviewer" for the one-line change, exactly as the skill mandates; opencode identically. The agents that passed did so by NOT following the skill. A skill that only produces good outcomes when disobeyed is miscalibrated.

What does this PR change?

Adds a Proportionality rule: a plan that is entirely one trivial, fully-specified mechanical change is implemented directly, verified per superpowers:verification-before-completion, and committed — no review fanout. Trivial is a property of the diff (no logic, control-flow, or security-relevant change — "a constant bump" is qualified with "no security or behavioral consequences"), not of the plan's self-description; any doubt means the full pipeline; multi-task plans never skip reviews regardless of task size. The process flowchart gets the matching trivial-exit diamond (the failing agents follow the flowchart literally), the Red Flags "never skip reviews" line points at the sole exception instead of contradicting it, and writing-plans' execution handoff notes that fanout scales.

Is this change appropriate for the core library?

Yes — tunes the core execution workflow's cost behavior for all users; non-trivial plans are unchanged (verified below).

What alternatives did you consider?

(1) Gate at entry (trivial plans never engage SDD) — viable but larger; the in-skill clause is the smaller diff. (2) Retire the eval — rejected: 4 full-context dispatches for one line is real measured waste. (3) Prose without the flowchart exit — rejected from evidence: the failing agents execute the flowchart, not the prose.

Does this PR contain multiple unrelated changes?

No — one rule, with the flowchart/Red-Flags/handoff touches required to keep the skill self-consistent about it.

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs: fix(skills): plans reference the spec instead of restating it — end to end #1715, fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718 (sibling behavioral changes, same eval program); none found addressing review-fanout proportionality. The series was originally a stacked chain (fix(skills): plans reference the spec instead of restating it — end to end #1715 → fix(skills): SDD review fanout scales with the change #1716 → fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718 → fix(skills): description-level exceptions are authoritative in the routing rule #1732) and was restructured into independent siblings; fix(skills): description-level exceptions are authoritative in the routing rule #1732 (the routing rule) was consolidated into fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718.

Environment tested

Harness	Harness version	Model	Model version/ID
opencode (agent under test)	1.16.2	GPT	openai/gpt-5.5
antigravity CLI (agent under test)	current	Gemini	Code Assist
Claude Code (agent under test)	2.1.169	Claude Opus	claude-opus-4-8

New harness support

N/A.

Evaluation

Scripted naive user ("Please execute the plan in docs/superpowers/plans/."; answers "Use your judgment" on subagent questions). Runs were measured on the assembled three-branch text (all three sibling PRs applied to dev); the Proportionality rule is the only change in the set that touches the fanout path. After the change: cost-trivial-task-review-fanout/opencode 3/3 pass — zero subagents dispatched (deterministic tool-count Agent lte 2), change landed on the main checkout, ~$0.02–0.08 coding cost per Gauntlet token estimates vs implied $0.50–2 for the 4-dispatch baseline; ×antigravity pass — the only deterministic pre-fix failer (0/3). Containment canary: sdd-rejects-extra-features/claude 3/3 pass — a real multi-task plan still runs implementer + two-stage review per task + final reviewer (spec reviewer as YAGNI gate after each task, 8/8 deterministic checks). Honest baseline note: opencode's pre-fix pass rate on this scenario was ~50%, so its cell leans on n=3 plus the antigravity flip for significance.

Rigor

If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing
This change was tested adversarially, not just on the happy path
I did not modify carefully-tuned content without evals showing improvement

Red-team findings incorporated: the Red Flags line was rewritten after a reviewer showed its first form licensed per-task skipping inside multi-task plans; "a one-line edit" was dropped from the examples after a reviewer showed it blessed one-line behavioral changes (|| user.isOwner); a staff-panel-caught internal contradiction (spec access wording) was fixed; the flowchart diamond carries "fully-specified" and "any doubt = no" to match the prose exactly.

Human review

A human has reviewed the COMPLETE proposed diff before submission

subagent-driven-development mandated implementer + two-stage review + final reviewer unconditionally — antigravity (agy) and opencode each dispatched 4 subagents for a one-line console.log (cost-trivial-task-review-fanout), and agents that passed did so only by disobeying the skill. - Proportionality rule: a plan that is entirely one trivial, fully-specified mechanical change is implemented directly, verified per superpowers:verification-before-completion, committed — no review fanout. Trivial is a property of the diff (no logic, control flow, or security-relevant change), not the plan's self-description; "a constant bump" is qualified (no security or behavioral consequences). Any doubt = full pipeline. Multi-task plans never skip reviews regardless of task size. - Flowchart gets the matching trivial-exit diamond (the failing agents follow the flowchart literally). - Red Flags "never skip reviews" points at the sole exception instead of contradicting it. - writing-plans' execution handoff notes fanout scales (forward reference resolves within this PR's base expectations: the Proportionality rule ships here). Independently mergeable: no dependency on the reference-discipline or brainstorming-exception PRs. Eval evidence (quorum): RED 4 dispatches for 1 line (agy, opencode); GREEN cost-trivial-task-review-fanout opencode 3/3 pass (0 dispatches, deterministic tool-count check) + antigravity pass (the formerly deterministic failer); containment canary sdd-rejects-extra-features claude 3/3 pass (full pipeline per task). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

arittr marked this pull request as draft June 10, 2026 00:47

obra mentioned this pull request Jun 10, 2026

fix(sdd): task-scoped review dispatch — single task reviewer, review-package script, eval-tuned #1717

Draft

5 tasks

arittr mentioned this pull request Jun 10, 2026

fix(skills): brainstorming nothing-to-design exception with authoritative description routing #1718

Draft

5 tasks

svenbledt mentioned this pull request Jun 10, 2026

fix(skills): account for Claude Fable in model selection and testing guidance #1719

Open

5 tasks

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from d71eb57 to 162ac4b Compare June 10, 2026 22:31

This was referenced Jun 10, 2026

fix(skills): plans reference the spec instead of restating it — end to end #1715

Open

fix(skills): description-level exceptions are authoritative in the routing rule #1732

Closed

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from 162ac4b to 60f6174 Compare June 11, 2026 02:16

arittr force-pushed the drew/sup-333-1-plans-reference-spec branch from 81874ec to fc5896b Compare June 11, 2026 06:49

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from 60f6174 to 70b52fd Compare June 11, 2026 06:49

arittr changed the base branch from drew/sup-333-1-plans-reference-spec to dev June 11, 2026 06:49

arittr force-pushed the drew/sup-333-2-sdd-proportionality branch from 70b52fd to f9d11b3 Compare June 11, 2026 07:21

snvtac mentioned this pull request Jun 11, 2026

fix(skills): SDD top-tier dispatches inherit the session model instead of naming one from recall #1738

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(skills): SDD review fanout scales with the change#1716

fix(skills): SDD review fanout scales with the change#1716
arittr wants to merge 1 commit into
devfrom
drew/sup-333-2-sdd-proportionality

arittr commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arittr commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Who is submitting this PR? (required)

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

New harness support

Evaluation

Rigor

Human review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arittr commented Jun 10, 2026 •

edited

Loading