From 5aefa4d9c490396bc08833e17bb43df40a8b06cb Mon Sep 17 00:00:00 2001 From: "Instar Agent (echo)" Date: Wed, 27 May 2026 19:03:04 -0700 Subject: [PATCH] fix(release-readiness): demote eval-failure to housekeeping default (sentinel-trio standard) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ReleaseReadinessSentinel.failLoud() posted a per-stage Attention item — and therefore a per-stage Telegram topic — every time the watchdog's own fetch / analyzer / tick stage broke ("Release-readiness check could not evaluate" with an inscrutable body like "analyze-release returned no report"). That violates the sentinel-trio standard from the 2026-05-22 topic-spam fix: internal- plumbing failures are housekeeping and belong in logs/sentinel-events.jsonl + server.log, not on the user's Telegram surface. - failLoud() audits unconditionally (canonical observability for housekeeping is the audit log) but only postAttention()s when monitoring.releaseReadiness .escalateEvalFailures is explicitly true. Default false. The user-actionable "Release blocked — unreleased work piling up" signal is unaffected — it always posts. - migrateRetireStaleReleaseReadinessEvalFailureAttention strips stale release-readiness-eval-failure-* rows from existing agents' attention-items .json on next update. Atomic, idempotent, preserves legitimate rows. - 6 new ReleaseReadinessSentinel tests cover both halves of the new contract (housekeeping default + escalate opt-in) plus a regression guard that the legitimate "release blocked" signal still posts under default config. - 7 new PostUpdateMigrator tests cover the cleanup migration (missing file, empty array, no-match no-op, selective drop, idempotency, malformed-entry tolerance, unparseable-JSON error reporting). - Post-mortem doc captures the root cause (spec framed loud-attention vs silent-catch as binary, missing the housekeeping path) and tracks follow-ups (sentinel-emit-site lint, SentinelEmitter primitive, spec template "failure-mode emit-site table", /spec-converge cross-reference rule, spec text correction). Origin: 2026-05-27 dogfood feedback on Echo — two "Release-readiness check could not evaluate" topics surfaced within 90 minutes (one fetch-stage, one analyzer-stage). Live cleanup of those two rows was performed via DELETE /attention before this PR. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...7-release-readiness-eval-failure-topics.md | 68 +++++++ src/commands/server.ts | 1 + src/config/ConfigDefaults.ts | 6 + src/core/PostUpdateMigrator.ts | 72 ++++++++ src/core/types.ts | 11 ++ src/monitoring/ReleaseReadinessSentinel.ts | 52 ++++-- ...leaseReadinessEvalFailureAttention.test.ts | 173 ++++++++++++++++++ tests/unit/ReleaseReadinessSentinel.test.ts | 50 ++++- upgrades/NEXT.md | 31 ++++ .../release-readiness-housekeeping-default.md | 45 +++++ 10 files changed, 495 insertions(+), 14 deletions(-) create mode 100644 docs/postmortems/2026-05-27-release-readiness-eval-failure-topics.md create mode 100644 tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts create mode 100644 upgrades/NEXT.md create mode 100644 upgrades/side-effects/release-readiness-housekeeping-default.md diff --git a/docs/postmortems/2026-05-27-release-readiness-eval-failure-topics.md b/docs/postmortems/2026-05-27-release-readiness-eval-failure-topics.md new file mode 100644 index 000000000..a21056067 --- /dev/null +++ b/docs/postmortems/2026-05-27-release-readiness-eval-failure-topics.md @@ -0,0 +1,68 @@ +# Post-mortem — Release-readiness eval-failure Telegram topics (2026-05-27) + +## Summary + +A new monitoring sentinel (`ReleaseReadinessSentinel`, shipped over PRs #433 / #442 / #443) emitted a per-stage Attention item — and therefore a new Telegram topic — every time the watchdog's own fetch / analyzer / tick stage broke. Across the v1.3.38 → v1.3.43 dogfood window on Echo, two such topics surfaced ("Release-readiness check could not evaluate"), with bodies that were inscrutable to a user ("analyze-release returned no report"). This pattern was banned six days earlier by the silently-stopped-trio fix (2026-05-22, post-topic-spam flood): internal-plumbing failures belong in the audit log + server log, not on the user's Telegram surface. + +The user caught it. The spec passed conformance. The conformance gate did not see this class of violation. + +## Timeline + +- **2026-05-22** — Silently-stopped-trio fix lands (#334, then wired in #340). Establishes the canonical "Sentinel Notifications" pattern: housekeeping by default → `logs/sentinel-events.jsonl` + `server.log`, Telegram escalation off by default, coalesced into ONE consolidated message in the existing system topic when opted in. Codified in agent `CLAUDE.md` and `docs/specs/silently-stopped-trio.md`. +- **2026-05-26..27** — `RELEASE-READINESS-VISIBILITY-SPEC.md` converges and lands as #433/#442/#443. §4.2.4 says the spec is "near-silent" (✓), and §4.2 explicitly says **any evaluation failure raises a low-priority Attention item — a silent catch is forbidden**. The two-option framing (loud-attention vs silent-catch) skipped over the housekeeping path the trio standard establishes. No cross-reference to `silently-stopped-trio.md`. +- **2026-05-27 (Echo dogfood window)** — Echo enabled the sentinel. Several ticks ran. The 23:54Z tick fetched canonical and failed (`canonical ref unreachable`); the 01:25Z tick reached the analyzer and got back no report. Each emitted a new Telegram topic via the Attention queue's "create-a-topic-per-item" design. +- **2026-05-27 18:30 PT** — User: "These topics keep popping up in Instar agents which goes directly against instar standards: they produce topic clutter; the messages are completely unhelpful." +- **2026-05-27 18:30..18:46 PT** — Diagnosis → branch `echo/release-readiness-housekeeping` → fix + tests + migrator + side-effects artifact + this post-mortem. +- **2026-05-27 18:35 PT** — Two stale items live-cleaned on Echo via `DELETE /attention/release-readiness-eval-failure-{fetch,analyzer}` (soft-delete; topics closed). + +## Root cause + +A spec-time framing error. The spec author treated the choice as binary: +1. **Loud signal** → post to Attention queue (creates Telegram topic). +2. **Silent catch** → eat the error → recreate the very bug §3 fixes. + +The trio standard establishes a third path: +3. **Housekeeping** → write to `logs/sentinel-events.jsonl` + `server.log` + emit an in-process event. Fully observable for diagnostics, never a user-facing topic. Optional, coalesced, single-hub-topic escalation behind a config flag. + +For evaluator-self-failures (the watchdog's own fetch / analyzer / tick stages), path 3 is the correct fit — they are internal plumbing the user can't act on. Path 1 was the wrong choice but was actively defended by the spec text. Path 2 was never on the table. + +## Contributing factors + +1. **No conformance check for sentinel emit-sites.** The Self-Hosting conformance gate exercises many checks (near-silent, 3-tier testing, migration parity, structure-over-willpower, no-manual-work). It does NOT, today, flag "this new `*Sentinel.ts` calls `postAttention` directly without classifying the emit-site against the silently-stopped-trio housekeeping/escalation taxonomy." +2. **No cross-spec consistency requirement.** A spec referencing the trio standard's pattern was not required. The spec mentioned "near-silent" but didn't cite the trio doc as a peer authority. +3. **No structural primitive.** SocketDisconnectSentinel / ActiveWorkSilenceSentinel implement the housekeeping pattern by hand. There is no shared `SentinelEmitter` primitive that bakes in the housekeeping default + escalation gate. Each new sentinel re-derives (or fails to re-derive) the pattern from prose. +4. **Dogfood-to-ship caught it — at the topic-clutter cost.** The "Echo dogfoods first" gate worked: the issue was caught by a real user before the sentinel shipped on default. But the catch came AFTER the user saw two topics, not before. Dogfood-as-only-safety-net is a smell — design-time review should have caught this. +5. **Spec language reinforced the bug.** "A silent catch is forbidden" framed loud-Attention as the only acceptable alternative. Housekeeping is not silent — it's persistent, structured, queryable observability — but the spec used "silent" pejoratively without distinguishing from "audited but not chat-surfacing." + +## What we're changing + +### Immediate (this PR) + +- `ReleaseReadinessSentinel.failLoud()` demoted to audit-only by default; opt-in via `monitoring.releaseReadiness.escalateEvalFailures`. +- `migrateRetireStaleReleaseReadinessEvalFailureAttention()` cleans up stale rows on existing agents. +- Spec text (next slice) — see "Follow-ups" below. + +### Follow-ups (tracked as separate work) + +1. **Sentinel-emit-site lint.** A pre-commit / CI lint that scans `src/monitoring/**/*Sentinel*.ts` for direct `postAttention(` calls and flags any that aren't either: + - Behind a config flag of the shape `*TelegramEscalation` / `escalate*Failures` / `*ChatEscalation`, OR + - Annotated `// @user-actionable-attention-ok — ` in the same expression. + This is the structural equivalent of the trio standard. Implements "structure > willpower" for the housekeeping taxonomy. + +2. **Sentinel emitter primitive.** Extract a small `SentinelEmitter` class with two methods: + - `recordHousekeeping(event, payload)` → audit + event (no user-facing emit by default) + - `escalate(item)` → routes to Attention iff the per-sentinel escalation flag is on, with built-in coalescing per the trio standard. + New sentinels use the primitive. Existing housekeeping-pattern sentinels (`SocketDisconnectSentinel`, `ActiveWorkSilenceSentinel`) migrate at leisure. Spec-time discussion becomes "which emit-sites are housekeeping vs user-actionable," not "do we postAttention." + +3. **Spec template update.** Any spec introducing a sentinel must include a "Failure-mode emit-site table" classifying each error path as (a) user-actionable Attention, (b) housekeeping audit-only, (c) opt-in escalation. The /spec-converge conformance pass requires this section. + +4. **Cross-reference rule.** `/spec-converge` flags any spec touching `src/monitoring/` that does NOT cite `docs/specs/silently-stopped-trio.md`. Mechanical, easy. + +5. **Spec text fix on `RELEASE-READINESS-VISIBILITY-SPEC.md`.** Replace the §4.2 "fail-loud Attention" language with the housekeeping default + escalation flag pattern; cite the trio standard. A follow-up PR (the spec is converged, the runtime behaviour now contradicts it — the doc must match the code). + +## Lessons + +- **Two coexisting standards is one standard not yet generalized.** When a class of failure (silently-stopped trio) gets a careful design and a separate class (release-readiness eval) reinvents a worse version of it, that's not two design problems — that's the trio standard wanting to be extracted into a primitive. Do the primitive. +- **"Fail-loud" is not a synonym for "Telegram topic."** Loud means observable and surfaced where the next operator looks. For internal-plumbing failures, that's `logs/sentinel-events.jsonl` and `server.log`. For user-actionable failures, it's the Attention queue. The spec should classify each emit-site explicitly. +- **Dogfood-to-ship works but is the last line of defense.** Catches at design time are cheaper than catches at dogfood time. Conformance checks are how we move catches earlier without slowing review. +- **A bad analogy in a spec writes itself into every implementation.** "A silent catch is forbidden" was true but framed the choice wrongly. Better: "Every failure must be audited; user-facing emission is a separate decision." Words matter; choose them so they don't preclude the right answer. diff --git a/src/commands/server.ts b/src/commands/server.ts index 8594753c9..b03d1e481 100644 --- a/src/commands/server.ts +++ b/src/commands/server.ts @@ -8147,6 +8147,7 @@ export async function startServer(options: StartOptions): Promise { backlogAgeDaysHigh: rrCfg.backlogAgeDaysHigh, hysteresisHours: rrCfg.hysteresisHours, staleEpisodeTtlDays: rrCfg.staleEpisodeTtlDays, + escalateEvalFailures: rrCfg.escalateEvalFailures, }); console.log(pc.green(' ReleaseReadinessSentinel enabled (release-hygiene watchdog — job-driven)')); } else { diff --git a/src/config/ConfigDefaults.ts b/src/config/ConfigDefaults.ts index 12f4cc860..32a9b0d41 100644 --- a/src/config/ConfigDefaults.ts +++ b/src/config/ConfigDefaults.ts @@ -88,6 +88,12 @@ const SHARED_DEFAULTS: Record = { hysteresisHours: 12, staleEpisodeTtlDays: 30, fetchTimeoutMs: 30_000, + // Evaluator-self-failures (fetch / analyzer / top-level tick) are + // HOUSEKEEPING by default — they write to the audit log + server.log + // but do not post a per-stage Attention item / Telegram topic. The + // user-actionable "release blocked" signal is unaffected. Set true to + // surface catastrophic watchdog failures in chat. Sentinel-trio standard. + escalateEvalFailures: false, }, // Master gate for Telegram delivery of silently-stopped-sentinel // escalations. Default false → sentinel notices are housekeeping and stay diff --git a/src/core/PostUpdateMigrator.ts b/src/core/PostUpdateMigrator.ts index bba611321..be728ef32 100644 --- a/src/core/PostUpdateMigrator.ts +++ b/src/core/PostUpdateMigrator.ts @@ -229,10 +229,82 @@ export class PostUpdateMigrator { this.migrateBootWrapperAbiCheck(result); this.migrateStaleLifelineSignal(result); this.migrateThreadlineConversationStore(result); + this.migrateRetireStaleReleaseReadinessEvalFailureAttention(result); return result; } + /** + * Retire stale `release-readiness-eval-failure-*` attention items left behind + * by the pre-housekeeping watchdog. From v1.3.43 down, ReleaseReadinessSentinel + * posted an Attention item — and therefore a new Telegram topic — every time + * the watchdog's own fetch / analyzer / tick stage broke. That violated the + * sentinel-trio standard (post-2026-05-22 topic-spam fix): internal-plumbing + * failures are housekeeping and belong in logs/sentinel-events.jsonl + + * server.log, not on the user's Telegram surface. + * + * The code-level fix demotes those emissions to audit-only (gated behind + * `monitoring.releaseReadiness.escalateEvalFailures`, default false). This + * migration cleans up the stragglers already on-disk so the topics don't + * keep haunting the topic list after update. + * + * Behaviour: + * - Reads .instar/state/attention-items.json. If absent, skip. + * - For every item whose id starts with `release-readiness-eval-failure-`: + * drop it from the items array. (The Telegram topic itself is left as-is; + * it was either /done'd by the user already, or will be unreferenced. We + * don't synchronously call Telegram from PostUpdateMigrator — the + * adapter isn't constructed at this point in startup.) + * - Atomic write (tmp + rename) so a crash mid-migration can't corrupt + * attention-items.json. + * - Idempotent: a second run finds zero matches and no-ops. + * + * Origin: 2026-05-27 dogfood feedback on Echo — repeated + * "Release-readiness check could not evaluate" topics violating the user's + * "no topic clutter for housekeeping" standard. + */ + private migrateRetireStaleReleaseReadinessEvalFailureAttention(result: MigrationResult): void { + const attentionPath = path.join(this.config.stateDir, 'state', 'attention-items.json'); + if (!fs.existsSync(attentionPath)) { + result.skipped.push('retire-stale-release-readiness-eval-failure-attention: no attention-items.json'); + return; + } + + let parsed: { items?: Array<{ id?: string }> }; + try { + parsed = JSON.parse(fs.readFileSync(attentionPath, 'utf-8')) as { items?: Array<{ id?: string }> }; + } catch (err) { + result.errors.push(`retire-stale-release-readiness-eval-failure-attention read: ${err instanceof Error ? err.message : String(err)}`); + return; + } + + if (!Array.isArray(parsed.items) || parsed.items.length === 0) { + result.skipped.push('retire-stale-release-readiness-eval-failure-attention: empty attention items'); + return; + } + + const before = parsed.items.length; + const filtered = parsed.items.filter((it) => { + const id = typeof it?.id === 'string' ? it.id : ''; + return !id.startsWith('release-readiness-eval-failure-'); + }); + const dropped = before - filtered.length; + if (dropped === 0) { + result.skipped.push('retire-stale-release-readiness-eval-failure-attention: none on disk'); + return; + } + + parsed.items = filtered; + try { + const tmpPath = `${attentionPath}.${process.pid}.tmp`; + fs.writeFileSync(tmpPath, JSON.stringify(parsed, null, 2)); + fs.renameSync(tmpPath, attentionPath); + result.upgraded.push(`retire-stale-release-readiness-eval-failure-attention: dropped ${dropped} stale item(s)`); + } catch (err) { + result.errors.push(`retire-stale-release-readiness-eval-failure-attention write: ${err instanceof Error ? err.message : String(err)}`); + } + } + /** * Regenerate the boot wrapper when it predates the ABI-aware node * self-heal (recurring-SQLite-bane fix). diff --git a/src/core/types.ts b/src/core/types.ts index ec7c2dd30..c2206dd0d 100644 --- a/src/core/types.ts +++ b/src/core/types.ts @@ -2769,6 +2769,17 @@ export interface MonitoringConfig { canonicalRemote?: string; /** Override the instar repo path to analyze (default: the agent home). */ repoPath?: string; + /** + * When true, evaluator-self-failures (fetch / analyzer / top-level tick + * stages of the watchdog itself) post a LOW-priority Attention item — and + * therefore a Telegram topic — in addition to the audit-log entry. Default + * false: per the sentinel-trio standard ("Sentinel Notifications" in the + * agent CLAUDE.md, post-2026-05-22 topic-spam fix), internal-plumbing + * failures are housekeeping and stay in logs/sentinel-events.jsonl + + * server.log. The user-actionable "release blocked — unreleased work + * piling up" signal always posts regardless of this flag. Flip on only if + * you also want catastrophic-failure surfacing in chat. */ + escalateEvalFailures?: boolean; }; /** * Master gate for Telegram delivery of silently-stopped-sentinel escalations diff --git a/src/monitoring/ReleaseReadinessSentinel.ts b/src/monitoring/ReleaseReadinessSentinel.ts index a3c9c0c03..7f4bffa06 100644 --- a/src/monitoring/ReleaseReadinessSentinel.ts +++ b/src/monitoring/ReleaseReadinessSentinel.ts @@ -20,9 +20,17 @@ * item per stall episode above it, keyed on the OLDEST unreleased commit * SHA (stable across ticks — not a resettable per-tick id), priority scaled * by backlog age, 12h hysteresis on re-raise after an auto-resolve. - * - Fail-loud: any evaluation failure (fetch error, analyzer error) raises a - * low-priority Attention item — never a silent catch (that would re-create - * the exact bug this fixes). + * - Fail-loud: any evaluation failure (fetch error, analyzer error, top-level + * tick error) writes a structured audit entry (sentinel-events.jsonl) and a + * dedup-keyed `eval-failed` emit — never a silent catch (that would re-create + * the exact bug this fixes). User-facing Telegram escalation of these + * evaluator-self-failures is HOUSEKEEPING by default, gated behind + * `escalateEvalFailures` (config: `monitoring.releaseReadiness.escalateEvalFailures`, + * default false), per the sentinel-trio standard ("Sentinel Notifications" + * in CLAUDE.md, post-2026-05-22 topic-spam fix). The audit log + server.log + * are the canonical observability surface; only the user-actionable + * "release blocked / unreleased work piling up" signal posts to Attention + * by default — that one is genuinely actionable. * - Lifecycle owner: detect → surface → auto-resolve → reap, with * resolveEpisodesInRange consulted by the publish-finalize path. * - Repo-gated: needs an analyzable instar git repo (dev/maintainer env). On @@ -120,6 +128,16 @@ export interface ReleaseReadinessSentinelConfig { backlogAgeDaysHigh?: number; hysteresisHours?: number; staleEpisodeTtlDays?: number; + /** + * When true, evaluator-self-failures (fetch / analyzer / top-level tick stages) + * post a low-priority Attention item in addition to the audit log. Default + * false: housekeeping per the sentinel-trio standard — the audit log + * (logs/sentinel-events.jsonl) + server.log are the canonical observability + * surface for internal-plumbing failures, so the user is not spammed with a + * Telegram topic per stage that breaks. The user-actionable "release blocked" + * signal is unaffected by this flag and always posts to Attention. + */ + escalateEvalFailures?: boolean; } const DEFAULTS: Required = { @@ -131,6 +149,7 @@ const DEFAULTS: Required = { backlogAgeDaysHigh: 7, hysteresisHours: 12, staleEpisodeTtlDays: 30, + escalateEvalFailures: false, }; const DAY_MS = 24 * 60 * 60 * 1000; @@ -346,17 +365,28 @@ export class ReleaseReadinessSentinel extends EventEmitter { private async failLoud(state: ReadinessState, stage: string, err: unknown): Promise { const key = `failure:${stage}`; + // Always audit — the audit log is the canonical observability surface for + // evaluator-self-failures. Both the dedup-suppressed and the un-suppressed + // paths produce an audit line so frequency is countable from disk. this.deps.audit({ kind: 'release-readiness', event: 'eval-failed', stage, error: String(err) }); if (state.lastFailureKey === key) return; // dedupe per failure episode state.lastFailureKey = key; - await this.deps.postAttention({ - id: `release-readiness-eval-failure-${stage}`, - title: 'Release-readiness check could not evaluate', - summary: `The release-readiness check failed at the "${stage}" stage: ${String(err)}. Last evaluated ${state.lastSignalAt ? new Date(state.lastSignalAt).toISOString() : 'never'}.`, - category: 'degradation', - priority: 'LOW', - }); - state.lastSignalAt = this.deps.now(); + // HOUSEKEEPING by default: do NOT post a per-stage Attention item (which + // would create a per-event Telegram topic — the exact anti-pattern banned + // by the sentinel-trio standard post-2026-05-22 topic-spam fix). The user + // hears about this kind of failure only when escalateEvalFailures is + // explicitly enabled. The audit emission above + the `eval-failed` event + // remain the supported observability handles. + if (this.cfg.escalateEvalFailures) { + await this.deps.postAttention({ + id: `release-readiness-eval-failure-${stage}`, + title: 'Release-readiness check could not evaluate', + summary: `The release-readiness check failed at the "${stage}" stage: ${String(err)}. Last evaluated ${state.lastSignalAt ? new Date(state.lastSignalAt).toISOString() : 'never'}.`, + category: 'degradation', + priority: 'LOW', + }); + state.lastSignalAt = this.deps.now(); + } this.emit('eval-failed', { stage }); } diff --git a/tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts b/tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts new file mode 100644 index 000000000..5468aa8f0 --- /dev/null +++ b/tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts @@ -0,0 +1,173 @@ +/** + * Unit tests for + * PostUpdateMigrator.migrateRetireStaleReleaseReadinessEvalFailureAttention. + * + * Background: from v1.3.43 down, ReleaseReadinessSentinel posted a per-stage + * Attention item (and therefore a per-stage Telegram topic) every time the + * watchdog's own fetch / analyzer / tick stage broke. That violated the + * sentinel-trio standard (post-2026-05-22 topic-spam fix). The code-level fix + * demotes those emissions to audit-only by default; this migration cleans up + * stragglers already on-disk so they don't keep haunting existing agents' + * topic lists after update. + * + * Covers: skips when attention-items.json absent / unreadable / empty; drops + * only ids beginning with `release-readiness-eval-failure-` (leaves the + * legitimate `release-readiness-` user-actionable items alone); atomic + * write; idempotency. + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import fs from 'node:fs'; +import path from 'node:path'; +import os from 'node:os'; +import { PostUpdateMigrator } from '../../src/core/PostUpdateMigrator.js'; +import { SafeFsExecutor } from '../../src/core/SafeFsExecutor.js'; + +interface MigrationResult { + upgraded: string[]; + errors: string[]; + skipped: string[]; +} + +function createTempDir(): string { + return fs.mkdtempSync(path.join(os.tmpdir(), 'instar-retire-rr-eval-failure-')); +} + +function cleanup(dir: string): void { + SafeFsExecutor.safeRmSync(dir, { recursive: true, force: true, operation: 'tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts:cleanup' }); +} + +function buildMigrator(projectDir: string) { + const stateDir = path.join(projectDir, '.instar'); + fs.mkdirSync(path.join(stateDir, 'state'), { recursive: true }); + const migrator = new PostUpdateMigrator({ + projectDir, + stateDir, + port: 4042, + hasTelegram: false, + projectName: 'test', + }); + const run = (migrator as unknown as { + migrateRetireStaleReleaseReadinessEvalFailureAttention: (result: MigrationResult) => void; + }).migrateRetireStaleReleaseReadinessEvalFailureAttention.bind(migrator); + return { stateDir, run }; +} + +function writeAttention(stateDir: string, items: Array>): string { + const p = path.join(stateDir, 'state', 'attention-items.json'); + fs.writeFileSync(p, JSON.stringify({ items }, null, 2)); + return p; +} + +function readAttention(stateDir: string): { items: Array> } { + return JSON.parse(fs.readFileSync(path.join(stateDir, 'state', 'attention-items.json'), 'utf-8')); +} + +describe('PostUpdateMigrator.migrateRetireStaleReleaseReadinessEvalFailureAttention', () => { + let projectDir: string; + beforeEach(() => { projectDir = createTempDir(); }); + afterEach(() => cleanup(projectDir)); + + it('skips gracefully when attention-items.json does not exist', () => { + const { run } = buildMigrator(projectDir); + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + expect(result.errors).toEqual([]); + expect(result.skipped.some((s) => s.includes('no attention-items.json'))).toBe(true); + expect(result.upgraded).toEqual([]); + }); + + it('skips when the items array is empty', () => { + const { stateDir, run } = buildMigrator(projectDir); + writeAttention(stateDir, []); + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + expect(result.errors).toEqual([]); + expect(result.skipped.some((s) => s.includes('empty attention items'))).toBe(true); + }); + + it('skips when no eval-failure items exist (idempotent on a clean install)', () => { + const { stateDir, run } = buildMigrator(projectDir); + writeAttention(stateDir, [ + { id: 'release-readiness-abcdef123456', status: 'OPEN', title: 'Release blocked — unreleased work is piling up' }, + { id: 'attn-unrelated-1', status: 'OPEN', title: 'Some other item' }, + ]); + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + expect(result.upgraded).toEqual([]); + expect(result.skipped.some((s) => s.includes('none on disk'))).toBe(true); + // Untouched. + const after = readAttention(stateDir); + expect(after.items).toHaveLength(2); + }); + + it('drops only `release-readiness-eval-failure-*` items and preserves the rest', () => { + const { stateDir, run } = buildMigrator(projectDir); + writeAttention(stateDir, [ + { id: 'release-readiness-eval-failure-fetch', status: 'DONE', topicId: 14496, title: 'Release-readiness check could not evaluate' }, + { id: 'release-readiness-eval-failure-analyzer', status: 'DONE', topicId: 14618, title: 'Release-readiness check could not evaluate' }, + { id: 'release-readiness-abcdef123456', status: 'OPEN', title: 'Release blocked — unreleased work is piling up' }, + { id: 'attn-unrelated-1', status: 'OPEN', title: 'Some other item' }, + ]); + + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + + expect(result.errors).toEqual([]); + expect(result.upgraded.some((m) => m.includes('dropped 2 stale item(s)'))).toBe(true); + + const after = readAttention(stateDir); + expect(after.items.map((i) => i.id)).toEqual([ + 'release-readiness-abcdef123456', + 'attn-unrelated-1', + ]); + }); + + it('is fully idempotent — a second run reports none on disk and is a no-op', () => { + const { stateDir, run } = buildMigrator(projectDir); + writeAttention(stateDir, [ + { id: 'release-readiness-eval-failure-fetch', status: 'DONE', topicId: 14496 }, + { id: 'attn-unrelated-1', status: 'OPEN' }, + ]); + + const first: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(first); + expect(first.upgraded.some((m) => m.includes('dropped 1 stale item(s)'))).toBe(true); + + const second: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(second); + expect(second.upgraded).toEqual([]); + expect(second.skipped.some((s) => s.includes('none on disk'))).toBe(true); + }); + + it('tolerates malformed entries (missing id field) without throwing', () => { + const { stateDir, run } = buildMigrator(projectDir); + writeAttention(stateDir, [ + { id: 'release-readiness-eval-failure-tick', status: 'OPEN' }, + { /* no id */ status: 'OPEN', title: 'broken entry' }, + { id: 42 as unknown as string, status: 'OPEN', title: 'wrong type' }, + { id: 'attn-keep', status: 'OPEN' }, + ]); + + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + + expect(result.errors).toEqual([]); + expect(result.upgraded.some((m) => m.includes('dropped 1 stale item(s)'))).toBe(true); + + const after = readAttention(stateDir); + // The malformed entries are PRESERVED — the migration only targets the + // matched eval-failure ids, never silently rewrites unrelated items. + expect(after.items).toHaveLength(3); + expect(after.items.find((i) => i.id === 'attn-keep')).toBeDefined(); + }); + + it('records a read error if attention-items.json is unparseable JSON', () => { + const { stateDir, run } = buildMigrator(projectDir); + fs.writeFileSync(path.join(stateDir, 'state', 'attention-items.json'), '{ not json'); + const result: MigrationResult = { upgraded: [], errors: [], skipped: [] }; + run(result); + expect(result.errors.length).toBeGreaterThan(0); + expect(result.errors[0]).toContain('retire-stale-release-readiness-eval-failure-attention read'); + }); +}); diff --git a/tests/unit/ReleaseReadinessSentinel.test.ts b/tests/unit/ReleaseReadinessSentinel.test.ts index 800a2241d..2ecfcabf6 100644 --- a/tests/unit/ReleaseReadinessSentinel.test.ts +++ b/tests/unit/ReleaseReadinessSentinel.test.ts @@ -133,21 +133,65 @@ describe('ReleaseReadinessSentinel', () => { expect(h.resolved[0].reason).toBe('cleared'); }); - it('fail-loud: raises a low-priority signal when the canonical fetch fails (deduped)', async () => { + // ─── fail-loud → housekeeping by default (sentinel-trio standard) ─── + // The original watchdog posted a LOW-priority Attention item — and therefore + // a per-stage Telegram topic — on every evaluator-self-failure. That violated + // the sentinel-trio standard codified after the 2026-05-22 topic-spam flood. + // The new default routes those failures to audit-only (logs/sentinel-events.jsonl + // + server.log + the `eval-failed` event). User-facing escalation is opt-in + // via `escalateEvalFailures`. + + it('fail-loud (housekeeping default): canonical-fetch failure audits but does NOT postAttention', async () => { h.fetchOk = false; + const evalFailed: Array<{ stage: string }> = []; const s = new ReleaseReadinessSentinel(h.deps()); + s.on('eval-failed', (e) => evalFailed.push(e as { stage: string })); + await s.tick(); + await s.tick(); // same failure → dedup keeps the audit but suppresses the event side-effects regardless + expect(h.posted).toHaveLength(0); + expect(h.audits.some((a) => a.event === 'eval-failed' && a.stage === 'fetch')).toBe(true); + // The first call emits eval-failed (event still fires so consumers can wire alerts); + // the second is deduped by lastFailureKey. + expect(evalFailed.map((e) => e.stage)).toEqual(['fetch']); + }); + + it('fail-loud (housekeeping default): analyzer null audits but does NOT postAttention', async () => { + h.analyzer = null; + const s = new ReleaseReadinessSentinel(h.deps()); + await s.tick(); + expect(h.posted).toHaveLength(0); + expect(h.audits.some((a) => a.event === 'eval-failed' && a.stage === 'analyzer')).toBe(true); + }); + + it('fail-loud (housekeeping default): user-actionable "release blocked" signal STILL posts to Attention', async () => { + // The escalateEvalFailures flag only gates evaluator-self-failures. + // The legitimate "unreleased commits piling up" signal must always reach + // the user — that's the whole reason the sentinel exists. + h.oldest = { sha: 'a'.repeat(40), dateMs: h.clock - 3 * DAY }; + const s = new ReleaseReadinessSentinel(h.deps()); // default: escalateEvalFailures=false + await s.tick(); + expect(h.posted).toHaveLength(1); + expect(h.posted[0].title).toContain('Release blocked'); + expect(h.posted[0].priority).toBe('LOW'); + }); + + it('fail-loud (escalateEvalFailures opt-in): canonical-fetch failure DOES postAttention (deduped)', async () => { + h.fetchOk = false; + const s = new ReleaseReadinessSentinel(h.deps(), { escalateEvalFailures: true }); await s.tick(); await s.tick(); // same failure → deduped expect(h.posted).toHaveLength(1); expect(h.posted[0].priority).toBe('LOW'); expect(h.posted[0].title).toContain('could not evaluate'); + expect(h.posted[0].id).toBe('release-readiness-eval-failure-fetch'); }); - it('fail-loud: raises a signal when the analyzer returns null', async () => { + it('fail-loud (escalateEvalFailures opt-in): analyzer null DOES postAttention', async () => { h.analyzer = null; - const s = new ReleaseReadinessSentinel(h.deps()); + const s = new ReleaseReadinessSentinel(h.deps(), { escalateEvalFailures: true }); await s.tick(); expect(h.posted.some((p) => p.title.includes('could not evaluate'))).toBe(true); + expect(h.posted[0].id).toBe('release-readiness-eval-failure-analyzer'); }); it('hysteresis: does not re-raise the same sha within the window after a resolve', async () => { diff --git a/upgrades/NEXT.md b/upgrades/NEXT.md new file mode 100644 index 000000000..aad2735fc --- /dev/null +++ b/upgrades/NEXT.md @@ -0,0 +1,31 @@ +# Instar Upgrade Guide — NEXT + + + +## What Changed + +`ReleaseReadinessSentinel.failLoud()` evaluator-self-failures (fetch / analyzer / top-level tick stages) are HOUSEKEEPING by default — they write to `logs/sentinel-events.jsonl` + `server.log` but no longer post a per-stage Attention item / Telegram topic. The user-actionable "Release blocked — unreleased work piling up" signal is unaffected (it always posts). + +Origin: dogfood feedback on Echo (2026-05-27) — the pre-fix sentinel created a new Telegram topic each time its own fetch or analyzer broke ("Release-readiness check could not evaluate"). That violated the sentinel-trio standard codified after the 2026-05-22 topic-spam flood — internal-plumbing failures belong in logs, not on the user's Telegram surface, especially when the body ("analyze-release returned no report") gives the user nothing actionable. + +The new behaviour matches `monitoring.sentinelTelegramEscalation`: housekeeping default, opt-in via `monitoring.releaseReadiness.escalateEvalFailures` for callers who do want catastrophic-failure escalation surfaced in chat. + +A PostUpdateMigrator step (`migrateRetireStaleReleaseReadinessEvalFailureAttention`) strips stale `release-readiness-eval-failure-*` items from existing agents' `attention-items.json` on update so closed-but-tracked topics don't keep haunting the topic list. + +## What to Tell Your User + +- The "Release-readiness check could not evaluate" topics that kept popping up on every fetch or analyzer hiccup are gone. Those signals were internal plumbing failures masquerading as user-actionable items; they now route to my audit log + server log instead, matching how my session-silence sentinels already work. You'll only see a real Attention item when the watchdog detects something you can act on — finished work piling up unreleased — not when the watchdog itself stumbles. + +## Summary of New Capabilities + +| Capability | How to Use | +|-----------|-----------| +| Quiet release-readiness watchdog by default | Nothing — existing config carries over. The watchdog still audits every tick to `logs/sentinel-events.jsonl`; the user-facing "release blocked" Attention path is unchanged. | +| Opt-in eval-failure escalation | Set `monitoring.releaseReadiness.escalateEvalFailures: true` in `.instar/config.json` if you want catastrophic watchdog failures (fetch / analyzer / tick stages) to surface as a low-priority Attention item again. | +| Stale eval-failure attention auto-retire | Migration runs on update — no manual cleanup needed. Stale `release-readiness-eval-failure-*` entries are dropped from `.instar/state/attention-items.json` idempotently. | + +## Evidence + +- 6 unit tests cover the new fail-loud behaviour (housekeeping default vs `escalateEvalFailures: true`, audit-always invariant, and the regression guard that the legitimate "release blocked" signal still posts under default config). +- 7 unit tests cover `migrateRetireStaleReleaseReadinessEvalFailureAttention` (absent file, empty array, no-match no-op, selective drop, idempotency, malformed-entry tolerance, unparseable JSON). +- Side-effects: `upgrades/side-effects/release-readiness-housekeeping-default.md`. diff --git a/upgrades/side-effects/release-readiness-housekeeping-default.md b/upgrades/side-effects/release-readiness-housekeeping-default.md new file mode 100644 index 000000000..735d4680e --- /dev/null +++ b/upgrades/side-effects/release-readiness-housekeeping-default.md @@ -0,0 +1,45 @@ +# Side-Effects Review — Release-readiness eval-failure → housekeeping default + +**Driver:** dogfood feedback on Echo (2026-05-27). The user observed two consecutive "Release-readiness check could not evaluate" Telegram topics within 90 minutes (one from a fetch-stage failure, one from an analyzer-stage failure) and called it out as anti-Instar topic clutter with an unhelpful body. This is the exact pattern the silently-stopped-trio fix banned after the 2026-05-22 topic-spam flood; the release-readiness sentinel was emitting against it. + +## What changed + +- **`src/monitoring/ReleaseReadinessSentinel.ts`** — `failLoud()` now writes the audit entry unconditionally (the canonical observability surface for housekeeping is the audit log), but only calls `deps.postAttention()` when the new config opt `escalateEvalFailures` is true. Default false. The dedupe path (`state.lastFailureKey === key` → early return) is unchanged. The `eval-failed` event is still emitted on the un-suppressed pass so external consumers can wire their own alerting if they want. +- **`src/monitoring/ReleaseReadinessSentinel.ts`** — `ReleaseReadinessSentinelConfig` gains an `escalateEvalFailures?: boolean` field; `DEFAULTS` sets it to `false`. The doc comment at the top of the file is updated to describe the new behaviour and cite the sentinel-trio standard. +- **`src/core/types.ts`** — `monitoring.releaseReadiness.escalateEvalFailures?: boolean` added with a JSDoc explaining the gate. +- **`src/config/ConfigDefaults.ts`** — `releaseReadiness.escalateEvalFailures: false` added to default config so the field is present in every freshly-init'd agent. +- **`src/commands/server.ts`** — the construction call site passes `rrCfg.escalateEvalFailures` through to the sentinel. +- **`src/core/PostUpdateMigrator.ts`** — new `migrateRetireStaleReleaseReadinessEvalFailureAttention()`. Strips items whose id begins with `release-readiness-eval-failure-` from `.instar/state/attention-items.json` so existing agents don't keep tracking the closed-but-haunting topics. Atomic write (tmp + rename), fully idempotent, leaves legitimate `release-readiness-` user-actionable items + every non-matching item alone. +- **`tests/unit/ReleaseReadinessSentinel.test.ts`** — replaced two pre-existing fail-loud tests with FIVE that pin both halves of the new contract: the housekeeping default (no postAttention; audit-always; event-emits-once-per-dedup-episode), the opt-in path (postAttention on, dedup still works), and the regression guard that the user-actionable "Release blocked — unreleased work is piling up" signal still posts under default config. +- **`tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts`** — 7 tests: missing attention-items.json, empty items, no eval-failure rows (idempotent), selective drop preserving siblings, full idempotency on a second run, malformed-entry tolerance (no id / wrong-type id), and unparseable-JSON error reporting. + +## Side-effects analysis + +**Default-change impact.** The visible behavioural change on an existing agent that updates with default config is: when the watchdog's fetch / analyzer / tick stage breaks, the user no longer gets a Telegram topic. They get an audit line in `logs/sentinel-events.jsonl` + a `console.warn` in `server.log` and the `eval-failed` event for any consumer that's wired one. This is exactly the visibility level the sentinel-trio standard establishes for housekeeping signals. The user-actionable "Release blocked" Attention path is untouched. + +**Opt-in restores the old behaviour.** A maintainer who explicitly wants catastrophic watchdog-self-failures in chat sets `monitoring.releaseReadiness.escalateEvalFailures: true` and gets the prior wiring back, including the per-stage Attention id (`release-readiness-eval-failure-fetch` / `…-analyzer` / `…-tick`). + +**Migration scope.** The PostUpdateMigrator step mutates ONLY rows whose id begins with the exact prefix `release-readiness-eval-failure-`. Non-matching rows (`release-readiness-` user-actionable, any unrelated attention id) are preserved byte-for-byte. The migration touches only `attention-items.json`; the Telegram topic itself is left as-is (Echo's two stale topics were already `/done`'d to closed state, and we soft-deleted them via `DELETE /attention/:id` during the live cleanup so the bot won't re-route messages to them). A second run finds nothing to drop and reports `skipped: 'none on disk'`. + +**Rollback.** Reverting this PR re-enables the per-event Telegram topic for evaluator-self-failures. The audit log + state files are forward-compatible (no schema change). A revert would not regenerate dropped attention rows (idempotent migration — that data was already noise). + +## Testing + +`tests/unit/ReleaseReadinessSentinel.test.ts` — 16 tests pass: +- housekeeping default: fetch failure audits + emits `eval-failed`, no postAttention; deduped second tick suppresses the event too +- housekeeping default: analyzer null audits, no postAttention +- housekeeping default: "Release blocked" user-actionable signal STILL posts (regression guard) +- opt-in `escalateEvalFailures: true`: fetch failure posts a deduped Attention item +- opt-in `escalateEvalFailures: true`: analyzer null posts an Attention item with the canonical id +- plus all pre-existing tests for blocked-detection, silent-threshold, priority escalation, hysteresis, resolveEpisodesInRange, TTL reaping, and disabled-state behaviour + +`tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts` — 7 tests pass + +`tests/unit/releaseReadinessWiring.test.ts` — 8 tests pass (unaffected, still green) +`tests/integration/release-readiness-routes.test.ts` — 5 tests pass (unaffected, still green) + +Type-check clean (`tsc --noEmit` zero output). + +## Live cleanup performed + +Echo's two stale items were soft-deleted via `DELETE /attention/release-readiness-eval-failure-fetch` and `DELETE /attention/release-readiness-eval-failure-analyzer` against the running server (status → DONE, topics closed). The PostUpdateMigrator step will pick up the on-disk rows on the next startup after this PR ships, completing the cleanup.