Skip to content

eval: boundary + plumbing scenarios for the SUP-333 skill-edit stack#11

Merged
arittr merged 1 commit into
mainfrom
round3-boundary-scenarios
Jun 11, 2026
Merged

eval: boundary + plumbing scenarios for the SUP-333 skill-edit stack#11
arittr merged 1 commit into
mainfrom
round3-boundary-scenarios

Conversation

@arittr

@arittr arittr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Four scenarios from the staff-review evidence program (see obra/superpowers#1715–#1732 stack): two brainstorming-exception boundary probes (security one-liner, hedge-phrased deletion), the SDD Spec Context plumbing check (deterministic tool-arg-match on dispatched prompts), and the writing-plans no-spec conversational path. First-run results recorded on the superpowers PRs. Known calibration debt: sdd-spec-context-consumed's AC requires SDD dispatch but the plan boilerplate sanctions executing-plans too; cost-remove-export-boundary's consequence examples read as a checklist to the grader.

🤖 Generated with Claude Code

Four scenarios from the staff-review evidence program:
- cost-session-timeout-boundary: the nothing-to-design exception must
  re-gate a security-consequential "one-liner" (session lifetime).
- cost-remove-export-boundary: hedge-phrased deletion of working
  user-visible functionality must re-gate.
- sdd-spec-context-consumed: SDD controller must paste plan-cited spec
  sections into subagent prompts (deterministic tool-arg-match on the
  dispatch args) — first functional coverage of the Spec Context
  plumbing.
- writing-plans-no-spec-conversational: conversational requirements →
  plan header "none — requirements:", no fabricated spec citation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
arittr added a commit to obra/superpowers that referenced this pull request Jun 11, 2026
…escription exceptions are authoritative (SUP-333 C)

Consolidates the brainstorming exception with its routing-layer
semantics, so this PR is independently mergeable (previously split
across two stacked PRs whose intermediate state left the always-
injected routing text contradicting the shipped description).

brainstorming: the nothing-to-design exception, earned by a tripwire
scan stated in one line before acting. Tripwires precede the
permission (skimmers stop at "implement directly"); security-posture
touches re-gate even with the exact value stated; requested deletions
re-gate; rationalization table per writing-skills bulletproofing.
Description 971/1024 chars, YAML-validated.

using-superpowers: description-level exceptions are authoritative
(compliance, not rationalization); doubt means invoke; only the
description can define one; the skip must state its scan; flowchart
routes the exempt path through the scan statement;
<EXTREMELY-IMPORTANT> defers in one parenthetical.

writing-skills: negative triggering conditions are scope (allowed,
required at the description) vs workflow summaries (still forbidden) —
prevents a future checklist pass from stripping the exception.

Eval evidence (quorum): RED cost-checkbox-over-trigger failed 5/6
agents (pi ⊘); GREEN claude 3/3, codex ✓, antigravity ✓ (kimi
unchanged from baseline — does not read description exceptions);
gate-still-fires: brainstorming-resists 2/2 + codex, spec-plan
brainstorm leg 3/3. Boundary scenarios (security one-liner, requested
deletion): pre-stack dev baseline 0/3 + 0/3 (silent edit every time —
the blanket gate never fired on one-liners); this text 2/3 + 2/3, the
first text in the corpus to catch these at any rate; scenarios ship as
regression instruments (proposed in prime-radiant-inc/superpowers-evals#11, open).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
arittr added a commit to obra/superpowers that referenced this pull request Jun 11, 2026
…escription exceptions are authoritative (SUP-333 C)

Consolidates the brainstorming exception with its routing-layer
semantics, so this PR is independently mergeable (previously split
across two stacked PRs whose intermediate state left the always-
injected routing text contradicting the shipped description).

brainstorming: the nothing-to-design exception, earned by a tripwire
scan stated in one line before acting. Tripwires precede the
permission (skimmers stop at "implement directly"); security-posture
touches re-gate even with the exact value stated; requested deletions
re-gate; rationalization table per writing-skills bulletproofing.
Description 971/1024 chars, YAML-validated.

using-superpowers: description-level exceptions are authoritative
(compliance, not rationalization); doubt means invoke; only the
description can define one; the skip must state its scan; flowchart
routes the exempt path through the scan statement;
<EXTREMELY-IMPORTANT> defers in one parenthetical.

writing-skills: negative triggering conditions are scope (allowed,
required at the description) vs workflow summaries (still forbidden) —
prevents a future checklist pass from stripping the exception.

Eval evidence (quorum): RED cost-checkbox-over-trigger failed 5/6
agents (pi ⊘); GREEN claude 3/3, codex ✓, antigravity ✓ (kimi
unchanged from baseline — does not read description exceptions);
gate-still-fires: brainstorming-resists 2/2 + codex, spec-plan
brainstorm leg 3/3. Boundary scenarios (security one-liner, requested
deletion): pre-stack dev baseline 0/3 + 0/3 (silent edit every time —
the blanket gate never fired on one-liners); this text 2/3 + 2/3, the
first text in the corpus to catch these at any rate; scenarios ship as
regression instruments (proposed in prime-radiant-inc/superpowers-evals#11, open).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Canary-caught addition: on the assembled text, triggering-writing-plans
went 0/3 with claude citing "your explicit instruction wins per the
priority rules" to skip writing-plans under the scenario's "don't ask
me any questions" pressure — the Instruction Priority section read as
licensing ad-hoc pressure to skip workflow steps. User Instructions now
distinguishes pressure phrasing (changes interaction style) from
instructions that name what to skip (honored), and tags the quoted
rationalization.
@arittr arittr merged commit 7f8e80c into main Jun 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant