docs(spec): arm the full response-review stack end-to-end [DRAFT, approved:false] by JKHeadley · Pull Request #389 · JKHeadley/instar

JKHeadley · 2026-05-25T22:21:21Z

Review surface only — approved: false, docs-only, no src/ changes. Not for merge until Justin ratifies.

Draft spec for the dark-guard problem on the response-review safety stack, co-designed with instar-codey (Threadline thread 33fbbe35-065b-4024-88bf-acb4779480e6).

What it covers

The response-review Stop-hook gate (catches unjustified self-termination, unsupported claims, tone violations) can be silently dark even when every individual layer reports healthy. Four independently-darkenable layers, all verified on JKHeadley/main @ v1.2.80:

L1 Claude — response-review.js written to disk + listed as managed, but never added to settings.json Stop[] (PostUpdateMigrator.ts:1766/1885; only Stop[].unshift is the autonomous hook at :2258).
L1 Codex — slot carries enabled=false on a matching trusted_hash; arm rule F3 never re-enables (codexHookArm.ts:16) → dark forever.
L2 Config — responseReview.enabled falsy → hook exit(0) before calling server (response-review.js:33-38).
L3 Runtime gate — CoherenceGate only constructed if responseReview.enabled && sharedIntelligence; else /review/evaluate → 501 (server.ts:7311, routes.ts:13131; truth source CapabilityIndex.ts:553).

Key requirements

Layered liveness model that fails closed if any layer is dark and names the dark layer (no single-green pass).
Atomic trusted_hash re-stamp on every managed-hook body rewrite (primary defense if codey's repro confirms Mode B drift-quarantine).
response-review/UnjustifiedStopGate org-policy-pinned: boot-time enabled=false → reassert + audit trail; non-pinned hooks surface as drift instead.
Migration parity for existing agents; test matrix with two independent assertions per slot + an end-to-end chain test.

Open before ratification

codey's Mode A vs Mode B clean-room repro (config.toml diffs across a hash-drifting body edit).
Full [hooks.state] block from codey.
Justin's approved: true.
External cross-model round (/ultrareview).

Files: docs/specs/arm-the-full-response-review-stack.md + .eli16.md companion.

🤖 Generated with Claude Code

…roved:false) Draft spec for the dark-guard problem on the response-review safety stack, co-designed with instar-codey (thread 33fbbe35). Four independently-darkenable layers (Claude absent-from-Stop, Codex trust-disabled, config-off, gate unconstructed); layered liveness model that fails closed if any layer is dark; org-policy-pinned safety slots with boot-time reassertion + audit; test matrix asserting each layer AND the end-to-end chain. Awaiting Justin ratification + codey's Mode A/B clean-room repro. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-25T22:21:24Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
instar	Ready	Preview, Comment	May 25, 2026 10:28pm

… final render bytes + four-part artifact identity; add repro evidence package Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends the arm-the-full-response-review-stack draft with the design points from the latest thread (33fbbe35) not yet captured: - §1 dual-install corroboration: server floor (L2/L3) dark on both echo (Claude) and codey (Codex) installs — fleet-wide, not Codex-specific. - §1.3 four-state taxonomy + explicit dependency order (server floor first, harness second); host-arming necessary but insufficient. - §1.2 provenance non-claim: we do not assert the installer wrote enabled=false; live config remediated 15:14:24 PDT, no pre-fix history. - R3 "armed-but-dark" named first-class non-pass health state; distinguish server-config-dark from server-intelligence-dark. - R6 interim behavior: fail-open WITH explicit dark-state signal, never silent success (signal-vs-authority). - §4.5 reversible self-test as the named, spec-owned acceptance procedure (baseline -> floor -> arm -> trigger -> assert produced+consumed -> restore). - ELI16 companion updated to match. Still DRAFT (approved:false) — awaiting Justin ratification per instar-dev gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 22:21 View deployment

spec(response-review): fold codey msg-2 invariant — trusted_hash from…

a9ba783

… final render bytes + four-part artifact identity; add repro evidence package Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 22:22 View deployment

vercel Bot deployed to Preview May 25, 2026 22:28 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(spec): arm the full response-review stack end-to-end [DRAFT, approved:false]#389

docs(spec): arm the full response-review stack end-to-end [DRAFT, approved:false]#389
JKHeadley wants to merge 3 commits into
mainfrom
echo/response-review-stack-spec

JKHeadley commented May 25, 2026

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JKHeadley commented May 25, 2026

What it covers

Key requirements

Open before ratification

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 25, 2026 •

edited

Loading