Skip to content

fix(skills): SDD review fanout scales with the change#1716

Draft
arittr wants to merge 1 commit into
devfrom
drew/sup-333-2-sdd-proportionality
Draft

fix(skills): SDD review fanout scales with the change#1716
arittr wants to merge 1 commit into
devfrom
drew/sup-333-2-sdd-proportionality

Conversation

@arittr

@arittr arittr commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Targets dev. Independently mergeable — self-contained: the Proportionality rule and every reference to it ship together. Related but independent: #1715 (reference discipline), #1718 (brainstorming exception).

Who is submitting this PR? (required)

Field Value
Your model + version Claude Fable 5 (claude-fable-5[1m])
Harness + version Claude Code 2.1.169
All plugins installed superpowers (this repo, dev checkout); quorum eval lab (superpowers-evals) as the testing apparatus; unrelated local ops plugins
Human partner who reviewed this diff Drew Ritter (@drewritter)

What problem are you trying to solve?

In the 2026-06-09 six-agent quorum sweep, cost-trivial-task-review-fanout — a scripted naive user asks the agent to execute a plan whose entire content is one one-line console.log insertion — showed subagent-driven-development's pipeline has no proportionality exit: antigravity "dispatched 4 subagents: Implementer, Spec Reviewer, Code Quality Reviewer, and Final Code Reviewer" for the one-line change, exactly as the skill mandates; opencode identically. The agents that passed did so by NOT following the skill. A skill that only produces good outcomes when disobeyed is miscalibrated.

What does this PR change?

Adds a Proportionality rule: a plan that is entirely one trivial, fully-specified mechanical change is implemented directly, verified per superpowers:verification-before-completion, and committed — no review fanout. Trivial is a property of the diff (no logic, control-flow, or security-relevant change — "a constant bump" is qualified with "no security or behavioral consequences"), not of the plan's self-description; any doubt means the full pipeline; multi-task plans never skip reviews regardless of task size. The process flowchart gets the matching trivial-exit diamond (the failing agents follow the flowchart literally), the Red Flags "never skip reviews" line points at the sole exception instead of contradicting it, and writing-plans' execution handoff notes that fanout scales.

Is this change appropriate for the core library?

Yes — tunes the core execution workflow's cost behavior for all users; non-trivial plans are unchanged (verified below).

What alternatives did you consider?

(1) Gate at entry (trivial plans never engage SDD) — viable but larger; the in-skill clause is the smaller diff. (2) Retire the eval — rejected: 4 full-context dispatches for one line is real measured waste. (3) Prose without the flowchart exit — rejected from evidence: the failing agents execute the flowchart, not the prose.

Does this PR contain multiple unrelated changes?

No — one rule, with the flowchart/Red-Flags/handoff touches required to keep the skill self-consistent about it.

Existing PRs

Environment tested

Harness Harness version Model Model version/ID
opencode (agent under test) 1.16.2 GPT openai/gpt-5.5
antigravity CLI (agent under test) current Gemini Code Assist
Claude Code (agent under test) 2.1.169 Claude Opus claude-opus-4-8

New harness support

N/A.

Evaluation

Scripted naive user ("Please execute the plan in docs/superpowers/plans/."; answers "Use your judgment" on subagent questions). Runs were measured on the assembled three-branch text (all three sibling PRs applied to dev); the Proportionality rule is the only change in the set that touches the fanout path. After the change: cost-trivial-task-review-fanout/opencode 3/3 pass — zero subagents dispatched (deterministic tool-count Agent lte 2), change landed on the main checkout, ~$0.02–0.08 coding cost per Gauntlet token estimates vs implied $0.50–2 for the 4-dispatch baseline; ×antigravity pass — the only deterministic pre-fix failer (0/3). Containment canary: sdd-rejects-extra-features/claude 3/3 pass — a real multi-task plan still runs implementer + two-stage review per task + final reviewer (spec reviewer as YAGNI gate after each task, 8/8 deterministic checks). Honest baseline note: opencode's pre-fix pass rate on this scenario was ~50%, so its cell leans on n=3 plus the antigravity flip for significance.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content without evals showing improvement

Red-team findings incorporated: the Red Flags line was rewritten after a reviewer showed its first form licensed per-task skipping inside multi-task plans; "a one-line edit" was dropped from the examples after a reviewer showed it blessed one-line behavioral changes (|| user.isOwner); a staff-panel-caught internal contradiction (spec access wording) was fixed; the flowchart diamond carries "fully-specified" and "any doubt = no" to match the prose exactly.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

subagent-driven-development mandated implementer + two-stage review +
final reviewer unconditionally — antigravity (agy) and opencode each dispatched 4
subagents for a one-line console.log (cost-trivial-task-review-fanout),
and agents that passed did so only by disobeying the skill.

- Proportionality rule: a plan that is entirely one trivial,
  fully-specified mechanical change is implemented directly, verified
  per superpowers:verification-before-completion, committed — no
  review fanout. Trivial is a property of the diff (no logic, control
  flow, or security-relevant change), not the plan's self-description;
  "a constant bump" is qualified (no security or behavioral
  consequences). Any doubt = full pipeline. Multi-task plans never
  skip reviews regardless of task size.
- Flowchart gets the matching trivial-exit diamond (the failing agents
  follow the flowchart literally).
- Red Flags "never skip reviews" points at the sole exception instead
  of contradicting it.
- writing-plans' execution handoff notes fanout scales (forward
  reference resolves within this PR's base expectations: the
  Proportionality rule ships here).

Independently mergeable: no dependency on the reference-discipline or
brainstorming-exception PRs.

Eval evidence (quorum): RED 4 dispatches for 1 line (agy, opencode);
GREEN cost-trivial-task-review-fanout opencode 3/3 pass (0 dispatches,
deterministic tool-count check) + antigravity pass (the formerly
deterministic failer); containment canary sdd-rejects-extra-features
claude 3/3 pass (full pipeline per task).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant