From 61a03eba77a46ff3342f46507550c84e2384787f Mon Sep 17 00:00:00 2001
From: "Instar Agent (echo)" <echo@instar.local>
Date: Fri, 26 Jun 2026 00:22:35 -0700
Subject: [PATCH 1/3] docs(constitution): add umbrella standard 'The User
 Experience Is the Product'

Synthesizes the 2026-06-25 user-reachability postmortem into one top-level
constitutional standard with 7 sub-standards as its teeth. The outward-facing
complement to 'Structure beats Willpower': every guard Instar built points
inward (what the agent emits, sessions alive, not quitting); none guarded the
user's ability to reach/hear/get a coherent answer. When internal caution and
the user's reach conflict, the user wins.

Sub-standard #4 (Guards Degrade, Not Outage) shipped its first teeth tonight
(PRs #1276/#1277/#1279, tone-gate graceful degradation). #1/#2/#3/#5/#6/#7
tracked as the F-series. Operator-ratifiable proposal per 'How a new standard
joins this registry'.

Earned-from: docs/incidents/2026-06-25-user-reachability-postmortem.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/STANDARDS-REGISTRY.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/docs/STANDARDS-REGISTRY.md b/docs/STANDARDS-REGISTRY.md
index 26d734b92..5033549ca 100644
--- a/docs/STANDARDS-REGISTRY.md
+++ b/docs/STANDARDS-REGISTRY.md
@@ -433,6 +433,20 @@ The Root says *enforce behavior in structure, not willpower.* This family is **w
 
 ## Interaction — the agent's surface to the user and the world
 
+### The User Experience Is the Product — Reachability, Responsiveness, and Coherence Are Sacred
+**Rule.** The user's ability to **reach** a live agent, **be heard**, and get a **timely, coherent response** is a first-class invariant that **outranks internal caution when the two conflict**. No internal guard, safety net, resource limit, or self-continuity discipline may *silently* degrade the user's channel. When a guard cannot do its job, it must **fail toward the user being served** — preserving safety by a different, cheaper, deterministic mechanism — and surface the degradation **loudly**. A guard may never be the single thing standing between the user and *any* response at all. The whole-experience invariant must have an **owner** (a UX-liveness watchdog), a **metric** (reach-rate and time-to-first-response), and a **hard rule**: when internal safety and the user's reach conflict, *the user wins*.
+**In practice.** This is the outward-facing complement to *Structure beats Willpower*: it says the structure must guard not only "the agent never does the wrong thing" but "the user reliably gets through." It is the umbrella over seven sub-standards, each the structural teeth for one failure mode, several of which already exist as their own articles above:
+  1. **State Convergence** — every declarative desired-state the system records (a placement pin, a lease) has an owning reconciler that drives actual→desired within a bounded time or escalates loudly. Declarative intent with no controller is a wish. *(Failure 1 — the cross-machine move that pinned but never actuated.)*
+  2. **Enforced Termination** — a time/iteration/resource budget is a **structural hard stop enforced by a watchdog OUTSIDE the run**, never the run's own willpower. This is *Structure beats Willpower* applied to the *end* of work, which we had only ever applied to its *middle* — closing the dangerous asymmetry where every pressure says "don't stop early" and none says "stop on time." *(Failure 2 — a 24h run that reached 46h.)*
+  3. **Inbound Delivery Is Sacred** — every inbound user message reaches a live session within a bounded time OR raises a loud failure through a channel proven deliverable; never a silent expiry. Corollary: **a partially-built safety mechanism must fail OPEN (pass through to the working path), never CLOSED (capture-and-drop)** — a half-finished net that intercepts then drops is more dangerous than no net. Sibling of *The Operator Channel Is Sacred* (which governs the inbound consume/pause gate); this governs the inbound holding-queue. *(Failure 3 — the durable queue that ate "why aren't you responding?")*
+  4. **Guards Degrade, Not Outage** — a safety check on the user-facing path must (i) have a fallback engine/path and (ii) distinguish *content-unsafe* (fail closed, hold) from *check-unavailable* (fail toward the user, with safety preserved by a cheaper deterministic check). The guard may never convert its own infra failure into the user's silence. The direct extension of *The Operator Channel Is Sacred* and *No Silent Degradation to Brittle Fallback* to the OUTBOUND tone gate. *(Failure 4 — the tone gate that held every reply when its single engine stalled.)*
+  5. **User-Facing Priority Lane** — when a scarce resource (subprocess slots, LLM quota, CPU) is contended, work on the live user path **preempts** background/housekeeping work; background watchers are load-shed first, never the user's reply. The complement to *Bounded Blast Radius* (which caps the spawn floor): the floor must not treat a user's reply and a coherence-sweep as equal citizens. *(Failure 5 — the fork-bomb floor that starved the user's channel.)*
+  6. **Degradation Is an Event, Not a Footnote** — a coordination layer running in a degraded/fallback mode (a stalled lease tick, a framework gone unavailable) must surface that state where it can reach the user, not re-arm quietly in a log. *(Failure 6 — the silently re-armed lease that made the ownership confusion easy to trigger.)*
+  7. **Blast-Radius Before, Verify After** — any mutating action on shared infrastructure (sessions, routing, ownership) must (i) prefer the reversible option and (ii) verify the user-facing invariants still hold afterward. *Capability ≠ authority* exists; this adds *capability ≠ wisdom — check what you're about to break, and confirm the user can still reach you after.* The behavioral kin of *Live-User-Channel Proof Before Done*, applied to an in-flight infra mutation rather than a feature ship. *(Failure 7 — the force-kill that black-holed inbound messages.)*
+**Earned from.** 2026-06-25 (incident window ~17:48–19:25 PDT): a runaway autonomous run overloaded the mesh, and then **the safety mechanisms themselves became the outage** — the inbound queue ate the user's messages, the tone gate blocked every reply, the fork-bomb floor starved the user's channel, and the "don't stop early" discipline (having no opposite) sheltered the runaway. Laid side by side, the seven failures reveal one blindspot: **every guard Instar had built pointed *inward* (what the agent emits, its sessions staying alive, it not quitting, it working on the right project, the machine not melting down); not one guarded the thing the user actually experiences — "can I reach a live agent, will it hear me, will it answer in reasonable time?"** Each component optimized its *local* safety, fail-closing toward "the agent does nothing wrong"; summed, they produced an agent that was internally pristine and **externally unreachable**. No standard said that when internal caution and the user's reach conflict, the user wins — so in every conflict, the user lost, quietly. The operator: "identify the root Blindspot and Meta issue that allowed this … what constitutional standard are we missing that would have [prevented it] in the first place." Full analysis: `docs/incidents/2026-06-25-user-reachability-postmortem.md`.
+**Traces to the goal.** A coherent, self-evolving agent that the user cannot reliably reach, be heard by, or get a coherent answer from is not coherent *to the only observer that matters* — it is a fortress that is perfectly safe and perfectly silent. Robustness defined only as "the agent never does the wrong thing" is half a definition; the other half is "the user reliably gets through." The genesis taught that for an agent, structure is the only carrier of self across discontinuity — and the user's channel is the one structure whose failure makes every other coherence invisible.
+**Applied through.** Sub-standard #4 (*Guards Degrade, Not Outage*) shipped its first structural teeth on 2026-06-26 — the outbound `MessagingToneGate` now degrades to an in-process deterministic leak floor (clean SENDS, leak HOLDS) on an LLM-backend outage instead of holding every reply, covering both the fast provider-throw and the slow route-budget timeout (PRs #1276 fast-engine-restored, #1277 per-framework breaker isolation, #1279 graceful-degrade; spec `docs/specs/tone-gate-graceful-degradation.md`). The remaining sub-standards (#1, #2, #3, #5, #6, #7) are tracked as the F-series fixes from the postmortem; per *How a new standard joins*, this article is the operator-ratifiable proposal and the honest test is that each tooth is *real*, not that listing it here makes it so.
+
 ### No Manual Work (user *or* agent)
 **Rule.** Capturing context and taking available actions must be automatic. Don't make the user remember Instar's features, and don't rely on the agent remembering to use its own tools.
 **In practice.** No "remember to log it" or "remember to run X" step survives into a design — for anyone. If a behavior depends on someone remembering, it isn't built yet. All user interaction goes through channels; the agent never asks the user to go edit a file.

From bd15f2ea6c6b7bd9150e76734a9edfc835d8325a Mon Sep 17 00:00:00 2001
From: "Instar Agent (echo)" <echo@instar.local>
Date: Fri, 26 Jun 2026 00:28:41 -0700
Subject: [PATCH 2/3] spec(enforced-termination): external hard-stop watchdog
 for autonomous runs (sub-standard #2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First concrete tooth-design for the UX-Is-The-Product umbrella standard.
External watchdog that hard-stops a run past its time/iteration budget —
the structural opposite of every 'don't stop early' pressure, which is what
let topic 27515 run 46h on a 24h budget. Grounded in the real autonomous-run
lifecycle: reuses autonomousRunRemainingForTopic(), coordinates with the
AutonomousLivenessReconciler + ResumeQueue so a deliberate kill stays killed,
mirrors the dev-gate/dryRun/guard-posture/audit fleet template.

Tag: pending review-convergence. Earned-from: 2026-06-25 postmortem Failure 2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/specs/enforced-termination-watchdog.md | 78 +++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 docs/specs/enforced-termination-watchdog.md

diff --git a/docs/specs/enforced-termination-watchdog.md b/docs/specs/enforced-termination-watchdog.md
new file mode 100644
index 000000000..e1ce785da
--- /dev/null
+++ b/docs/specs/enforced-termination-watchdog.md
@@ -0,0 +1,78 @@
+# Enforced Termination Watchdog — an external hard-stop for autonomous runs
+
+**Status:** draft (spec). Tag: pending review-convergence.
+**Constitution:** *The User Experience Is the Product* → sub-standard #2 **Enforced Termination**; an instance of *Structure beats Willpower* applied to the **end** of work; sibling of *An Autonomous Run Must Outlive Its Session* (which keeps a run alive across vessel events — this is its deliberate counterweight: keep a run from outliving its **budget**).
+**Earned from:** 2026-06-25 — an autonomous run (topic 27515) with a hard 24h budget ran ~46h (iteration 216), continuously churning the machine pool and spawning subprocesses. Nothing outside the run forced it to stop. Full context: `docs/incidents/2026-06-25-user-reachability-postmortem.md` (Failure 2).
+
+---
+
+## 1. The problem — enforcement lives inside the run
+
+Today an autonomous run's deadline is enforced in exactly one place: the per-session **Stop hook** (`.claude/skills/autonomous/hooks/autonomous-stop-hook.sh:472-489`), which checks `elapsed >= duration_seconds` at a Claude Code **Stop event** (a turn boundary) and removes the state file. That is *the run's own willpower* — and it has four holes the runaway fell through:
+
+1. **No Stop event ever fires.** A wedged or tight-looping session that never cleanly reaches a turn boundary never runs the check. (This is how 27515 ran 46h.)
+2. **Unbounded runs.** `duration_seconds` absent/0 ⇒ the hook's `[[ $DURATION_SECONDS -gt 0 ]]` guard skips, and `autonomousRunRemainingForTopic()` (`src/core/AutonomousSessions.ts:106-126`) returns `null` — the run has no deadline at all and is invisible to every "how long is left" path.
+3. **Unparseable `started_at`.** The hook fails *toward keep-running* (`autonomous-stop-hook.sh:486-488`).
+4. **No external clock.** Nothing outside the run's process holds the budget. There is no duration-based killer anywhere in `src/monitoring/` (confirmed). `SessionClockReader`/`SessionClock` only *compute and display* `remainingSeconds` — they never gate.
+
+Every structural pressure in Instar points toward **continuing** (the stop-gate, "No context-death self-stops", the completion-discipline that refuses premature exit). **None points toward terminating on time.** The discipline meant to prevent laziness instead sheltered a runaway. This spec adds the missing opposite force, as *structure outside the run*, never the run's own discretion.
+
+## 2. Design — `EnforcedTerminationWatchdog`
+
+A level-triggered monitoring loop (mirrors `AutonomousLivenessReconciler`) that, each tick, finds runs that have **provably overrun** and drives them to a **durable** terminated state. It is the external twin of the in-hook duration check — it does **not** reimplement the math, it reuses `autonomousRunRemainingForTopic()` and the raw frontmatter.
+
+### 2.1 Overrun predicate (per active run)
+
+A run is `overrun` when **any** of:
+- **Time budget exceeded:** `duration_seconds > 0` AND `now - started_at >= duration_seconds + graceSeconds`. (`graceSeconds` default 120s — lets the cooperative hook win the normal case; the watchdog only fires when the hook *didn't*.)
+- **Absolute ceiling (covers the unbounded + unparseable holes):** `now - started_at >= absoluteCeilingSeconds` (default 26h — just past the longest sanctioned run; a hard backstop that fires even when `duration_seconds` is null/0 or `started_at` is malformed-but-old). An unparseable `started_at` is treated as *file mtime* for this ceiling only (fail toward a bounded stop, never toward run-forever — the inverse of the hook's bias, because here the safe direction is termination).
+- **Iteration ceiling (optional):** `iteration >= maxIterations` when configured.
+
+`overrun` is computed from durable state only (the frontmatter + file mtime), so it survives a server restart.
+
+### 2.2 Two-phase confirm (no kill on a single read)
+
+Mirrors `SessionReaper`'s reap-pending pattern. Tick N marks a topic `terminate-pending` (persisted in durable cap state); a kill happens only when tick N+1 **re-confirms** the same topic is still overrun AND still has a live session. This absorbs a clock blip, an in-flight cooperative stop landing between ticks, and a just-completed run.
+
+### 2.3 The durable termination (the reconciler-coordination crux)
+
+A deliberately-terminated run must **stay** terminated against the **two** respawn paths — the `AutonomousLivenessReconciler` and the `ResumeQueueDrainer`. From the existing operator-stop coordination, a durable stop requires **all** of:
+
+1. `stopAutonomousTopic(stateDir, topic, journal)` — emits the `stopped` journal row and **deletes the state file** (`AutonomousSessions.ts:357-365`). Once gone, `listActiveRuns()` yields nothing → not a reconciler candidate. **Load-bearing across restarts.**
+2. `operatorStopRecorder(topic)` (the watchdog is an authorized internal stopper) — records the stop timestamp the reconciler rechecks at criteria-3, at actuation, and post-spawn (`AutonomousLivenessReconciler.ts:373,625,666`). Belt for the window where the file might be re-synced from a peer.
+3. `resumeQueue.cancelByTopic(topic)` — the ResumeQueue is a second respawn path that also honors `operatorStopSince` (`ResumeQueueDrainer.ts:566`).
+4. Clear mid-work, then hard-kill the live session: `sess.endedMidWork = false; state.saveSession(sess)` **then** `sessionManager.killSession(...)` — mirrors `settleKill` (`server.ts:8005-8016`) so the kill is not itself queued for revival.
+
+The **generation guard** (`AutonomousLivenessReconciler.ts:713-719`) already ensures a genuinely-new run on the same topic (newer `started_at`) is *not* blocked by an old termination — so re-launching the topic later is unaffected.
+
+### 2.4 What it must NOT do
+
+- **Never terminate a run still inside its budget** — even one that looks idle. (That is the reaper's job, under its own pressure rules; this watchdog fires *only* on budget overrun.)
+- **Never fight a cooperative stop.** The `graceSeconds` window gives the in-hook check first right of termination; the watchdog is the backstop for when it can't run.
+- **Never silently disable itself.** Registered unconditionally in the guard posture (below).
+
+## 3. User-facing surfacing (the standard's "loud" requirement)
+
+Per sub-standard #6 *Degradation Is an Event* and the umbrella rule that a stop is never silent: every enforced termination posts **one** plain-English notice to the run's report topic — *"I stopped the autonomous run on <topic> — it passed its <N>h budget by <M>. Anything unfinished is in its notes; tell me to relaunch if you want me to continue."* — and an Attention item if the report channel is unavailable. This is the inverse of the 27515 silence, where the overrun was visible to no one.
+
+## 4. Rollout & observability (mirror the fleet template)
+
+- **Dev-gate + dryRun-first:** `monitoring.enforcedTermination.enabled` **omitted** from `ConfigDefaults` ⇒ ships dark fleet-wide, resolves live on a development agent via `resolveDevAgentGate`; `dryRun: cfg.dryRun ?? true` ⇒ on dev it computes overruns and logs `would-terminate` + shadow cap counters but actuates nothing until a deliberate `dryRun:false`.
+- **Guard posture:** expose `guardStatus()` and `guardRegistry.register('monitoring.enforcedTermination.enabled', …)` **unconditionally** (even when disabled) so a silently-off watchdog reads `off-runtime-divergent` on `/guards`, not invisible.
+- **Audit:** append-only `logs/enforced-termination.jsonl` (5MB×1 rotation), one row per transition (`overrun-detected` / `terminate-pending` / `terminated` / `would-terminate` / `false-alarm` / `skipped-grace`), each stamped with topic, started_at, duration_seconds, elapsed, predicate-that-fired.
+- **Bounded actuation:** per-window cap on terminations (a flapping detector gives up LOUDLY via one aggregated Attention item rather than kill-looping — P19), durable cap state (`loadCapState`/`saveCapState`).
+- **Status route:** `GET /autonomous/enforced-termination` → `{ enabled, dryRun, graceSeconds, absoluteCeilingSeconds, lastTickAt, pending:[topics], terminated24h, wouldTerminate24h }` (503 when dark).
+
+## 5. Tests (all three tiers + wiring + both decision sides)
+
+- **Unit:** overrun predicate — within budget (no), past budget+grace (yes), unbounded run past absolute ceiling (yes), unparseable started_at older than ceiling by mtime (yes), iteration ceiling. Two-phase confirm: single overrun tick does NOT kill; two consecutive do. dryRun: computes + logs `would-terminate`, actuates zero.
+- **Durability/wiring:** a terminated topic is gone from `listActiveRuns`; `operatorStopRecorder` called; `resumeQueue.cancelByTopic` called; `endedMidWork` cleared before `killSession`. Integration with a stub reconciler: after termination, the reconciler does NOT respawn (criteria-3 stop recheck honored). A genuinely-new run on the same topic (newer started_at) IS allowed (generation guard).
+- **Integration (HTTP):** `/autonomous/enforced-termination` returns 200 with the feature on (dev), 503 when dark.
+- **E2E:** with the watchdog live (dryRun:false) on a throwaway agent, a seeded run with a 5s budget + a wedged (no-Stop-event) session is killed within ~2 ticks and stays dead; the report topic gets the plain-English notice.
+
+## 6. Open questions (for review-convergence)
+
+1. **`absoluteCeilingSeconds` default** — 26h assumes the longest sanctioned run is ~24h. Should it instead be `max(observed duration_seconds across runs) + margin`, or a config-derived multiple of the run's own budget (e.g. `1.5 × duration_seconds`, falling back to a flat ceiling only for unbounded runs)? The flat 26h is simplest and covers the incident; the per-run multiple is tighter for short runs.
+2. **Grace vs. wedge** — `graceSeconds` (120s) lets the cooperative hook win, but a truly wedged session never fires the hook, so grace just delays the inevitable kill by 2 min. Acceptable, or should a *separately-detected* wedge (no turn boundary in N min past deadline) skip grace entirely?
+3. **Unbounded runs** — should the watchdog instead *refuse to let a run start unbounded* (a `can-start` gate that requires a `duration_seconds`), making the absolute ceiling a backstop rather than the primary path? That is arguably the more structural fix (enforce a budget at creation), with this watchdog as defense-in-depth.
+4. **Interaction with a topic mid-move** — if a run is suspended for a cross-machine move (`move_suspended_at` set), the watchdog must treat it as not-active (no kill) and let the destination re-evaluate. Confirm the predicate excludes `move_suspended_at`/`moved_to` rows.

From 60ad1df55271cf6c13047f99f6248f6cbd24f5bc Mon Sep 17 00:00:00 2001
From: "Instar Agent (echo)" <echo@instar.local>
Date: Fri, 26 Jun 2026 03:54:26 -0700
Subject: [PATCH 3/3] chore: re-trigger CI to read updated PR body (ELI16
 section added)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>