From 1a2cfea593dfa72255beb05077ce8315c6ab3bad Mon Sep 17 00:00:00 2001
From: citizen204 <cxi9371@outlook.com>
Date: Fri, 5 Jun 2026 22:40:00 +0930
Subject: [PATCH] fix(writing-skills): replace count anchors with
 standard-based language

Replace '3+ combined pressures' count targets with qualitative standards
throughout SKILL.md and testing-skills-with-subagents.md (5 sites).

The count was doing gating work the actual standard could carry alone:
'scenarios that reliably trigger the rationalizations you want to prevent.'
A scenario with two strong, well-aligned pressures can trigger the
rationalization more reliably than four weak unaligned ones.

Closes #1487
---
 skills/writing-skills/SKILL.md                         |  2 +-
 skills/writing-skills/testing-skills-with-subagents.md | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/skills/writing-skills/SKILL.md b/skills/writing-skills/SKILL.md
index d7393dc790..b818fc4987 100644
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -598,7 +598,7 @@ Deploying untested skills = deploying untested code. It's a violation of quality
 **IMPORTANT: Create a todo for EACH checklist item below.**
 
 **RED Phase - Write Failing Test:**
-- [ ] Create pressure scenarios (3+ combined pressures for discipline skills)
+- [ ] Create pressure scenarios that reliably trigger the rationalizations you want to prevent (combine pressures that make the rationalization feel genuinely compelling — the test for "enough" is whether an agent without the skill consistently takes the rationalization path, not a count)
 - [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim
 - [ ] Identify patterns in rationalizations/failures
 
diff --git a/skills/writing-skills/testing-skills-with-subagents.md b/skills/writing-skills/testing-skills-with-subagents.md
index a5acfeac86..fb17482616 100644
--- a/skills/writing-skills/testing-skills-with-subagents.md
+++ b/skills/writing-skills/testing-skills-with-subagents.md
@@ -48,7 +48,7 @@ This is identical to TDD's "write failing test first" - you MUST see what agents
 
 **Process:**
 
-- [ ] **Create pressure scenarios** (3+ combined pressures)
+- [ ] **Create pressure scenarios** that reliably trigger the rationalizations you want to prevent (combine pressures that make the rationalization feel genuinely compelling)
 - [ ] **Run WITHOUT skill** - give agents realistic task with pressures
 - [ ] **Document choices and rationalizations** word-for-word
 - [ ] **Identify patterns** - which excuses appear repeatedly?
@@ -137,7 +137,7 @@ Forces explicit choice.
 | **Social** | Looking dogmatic, seeming inflexible |
 | **Pragmatic** | "Being pragmatic vs dogmatic" |
 
-**Best tests combine 3+ pressures.**
+**Best tests combine pressures that make the rationalization feel genuinely compelling.** The diagnostic: does an agent without the skill, run on this scenario, consistently take the rationalization path you're trying to close? A scenario with two strong, well-aligned pressures can do this more reliably than four weak unaligned ones.
 
 **Why this works:** See persuasion-principles.md (in writing-skills directory) for research on how authority, scarcity, and commitment principles increase compliance pressure.
 
@@ -310,7 +310,7 @@ Meta-test: "Skill was clear, I should follow it"
 Before deploying skill, verify you followed RED-GREEN-REFACTOR:
 
 **RED Phase:**
-- [ ] Created pressure scenarios (3+ combined pressures)
+- [ ] Created pressure scenarios that reliably trigger the rationalizations you want to prevent
 - [ ] Ran scenarios WITHOUT skill (baseline)
 - [ ] Documented agent failures and rationalizations verbatim
 
@@ -340,8 +340,8 @@ Running only academic tests, not real pressure scenarios.
 ✅ Fix: Use pressure scenarios that make agent WANT to violate.
 
 **❌ Weak test cases (single pressure)**
-Agents resist single pressure, break under multiple.
-✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion).
+Agents resist single pressure, break under combined pressures.
+✅ Fix: Combine pressures that make the rationalization feel genuinely compelling — the standard is whether an agent without the skill consistently takes the rationalization path, not a count.
 
 **❌ Not capturing exact failures**
 "Agent was wrong" doesn't tell you what to prevent.