Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Post-mortem — Release-readiness eval-failure Telegram topics (2026-05-27)

## Summary

A new monitoring sentinel (`ReleaseReadinessSentinel`, shipped over PRs #433 / #442 / #443) emitted a per-stage Attention item — and therefore a new Telegram topic — every time the watchdog's own fetch / analyzer / tick stage broke. Across the v1.3.38 → v1.3.43 dogfood window on Echo, two such topics surfaced ("Release-readiness check could not evaluate"), with bodies that were inscrutable to a user ("analyze-release returned no report"). This pattern was banned six days earlier by the silently-stopped-trio fix (2026-05-22, post-topic-spam flood): internal-plumbing failures belong in the audit log + server log, not on the user's Telegram surface.

The user caught it. The spec passed conformance. The conformance gate did not see this class of violation.

## Timeline

- **2026-05-22** — Silently-stopped-trio fix lands (#334, then wired in #340). Establishes the canonical "Sentinel Notifications" pattern: housekeeping by default → `logs/sentinel-events.jsonl` + `server.log`, Telegram escalation off by default, coalesced into ONE consolidated message in the existing system topic when opted in. Codified in agent `CLAUDE.md` and `docs/specs/silently-stopped-trio.md`.
- **2026-05-26..27** — `RELEASE-READINESS-VISIBILITY-SPEC.md` converges and lands as #433/#442/#443. §4.2.4 says the spec is "near-silent" (✓), and §4.2 explicitly says **any evaluation failure raises a low-priority Attention item — a silent catch is forbidden**. The two-option framing (loud-attention vs silent-catch) skipped over the housekeeping path the trio standard establishes. No cross-reference to `silently-stopped-trio.md`.
- **2026-05-27 (Echo dogfood window)** — Echo enabled the sentinel. Several ticks ran. The 23:54Z tick fetched canonical and failed (`canonical ref unreachable`); the 01:25Z tick reached the analyzer and got back no report. Each emitted a new Telegram topic via the Attention queue's "create-a-topic-per-item" design.
- **2026-05-27 18:30 PT** — User: "These topics keep popping up in Instar agents which goes directly against instar standards: they produce topic clutter; the messages are completely unhelpful."
- **2026-05-27 18:30..18:46 PT** — Diagnosis → branch `echo/release-readiness-housekeeping` → fix + tests + migrator + side-effects artifact + this post-mortem.
- **2026-05-27 18:35 PT** — Two stale items live-cleaned on Echo via `DELETE /attention/release-readiness-eval-failure-{fetch,analyzer}` (soft-delete; topics closed).

## Root cause

A spec-time framing error. The spec author treated the choice as binary:
1. **Loud signal** → post to Attention queue (creates Telegram topic).
2. **Silent catch** → eat the error → recreate the very bug §3 fixes.

The trio standard establishes a third path:
3. **Housekeeping** → write to `logs/sentinel-events.jsonl` + `server.log` + emit an in-process event. Fully observable for diagnostics, never a user-facing topic. Optional, coalesced, single-hub-topic escalation behind a config flag.

For evaluator-self-failures (the watchdog's own fetch / analyzer / tick stages), path 3 is the correct fit — they are internal plumbing the user can't act on. Path 1 was the wrong choice but was actively defended by the spec text. Path 2 was never on the table.

## Contributing factors

1. **No conformance check for sentinel emit-sites.** The Self-Hosting conformance gate exercises many checks (near-silent, 3-tier testing, migration parity, structure-over-willpower, no-manual-work). It does NOT, today, flag "this new `*Sentinel.ts` calls `postAttention` directly without classifying the emit-site against the silently-stopped-trio housekeeping/escalation taxonomy."
2. **No cross-spec consistency requirement.** A spec referencing the trio standard's pattern was not required. The spec mentioned "near-silent" but didn't cite the trio doc as a peer authority.
3. **No structural primitive.** SocketDisconnectSentinel / ActiveWorkSilenceSentinel implement the housekeeping pattern by hand. There is no shared `SentinelEmitter` primitive that bakes in the housekeeping default + escalation gate. Each new sentinel re-derives (or fails to re-derive) the pattern from prose.
4. **Dogfood-to-ship caught it — at the topic-clutter cost.** The "Echo dogfoods first" gate worked: the issue was caught by a real user before the sentinel shipped on default. But the catch came AFTER the user saw two topics, not before. Dogfood-as-only-safety-net is a smell — design-time review should have caught this.
5. **Spec language reinforced the bug.** "A silent catch is forbidden" framed loud-Attention as the only acceptable alternative. Housekeeping is not silent — it's persistent, structured, queryable observability — but the spec used "silent" pejoratively without distinguishing from "audited but not chat-surfacing."

## What we're changing

### Immediate (this PR)

- `ReleaseReadinessSentinel.failLoud()` demoted to audit-only by default; opt-in via `monitoring.releaseReadiness.escalateEvalFailures`.
- `migrateRetireStaleReleaseReadinessEvalFailureAttention()` cleans up stale rows on existing agents.
- Spec text (next slice) — see "Follow-ups" below.

### Follow-ups (tracked as separate work)

1. **Sentinel-emit-site lint.** A pre-commit / CI lint that scans `src/monitoring/**/*Sentinel*.ts` for direct `postAttention(` calls and flags any that aren't either:
- Behind a config flag of the shape `*TelegramEscalation` / `escalate*Failures` / `*ChatEscalation`, OR
- Annotated `// @user-actionable-attention-ok — <one-line justification>` in the same expression.
This is the structural equivalent of the trio standard. Implements "structure > willpower" for the housekeeping taxonomy.

2. **Sentinel emitter primitive.** Extract a small `SentinelEmitter` class with two methods:
- `recordHousekeeping(event, payload)` → audit + event (no user-facing emit by default)
- `escalate(item)` → routes to Attention iff the per-sentinel escalation flag is on, with built-in coalescing per the trio standard.
New sentinels use the primitive. Existing housekeeping-pattern sentinels (`SocketDisconnectSentinel`, `ActiveWorkSilenceSentinel`) migrate at leisure. Spec-time discussion becomes "which emit-sites are housekeeping vs user-actionable," not "do we postAttention."

3. **Spec template update.** Any spec introducing a sentinel must include a "Failure-mode emit-site table" classifying each error path as (a) user-actionable Attention, (b) housekeeping audit-only, (c) opt-in escalation. The /spec-converge conformance pass requires this section.

4. **Cross-reference rule.** `/spec-converge` flags any spec touching `src/monitoring/` that does NOT cite `docs/specs/silently-stopped-trio.md`. Mechanical, easy.

5. **Spec text fix on `RELEASE-READINESS-VISIBILITY-SPEC.md`.** Replace the §4.2 "fail-loud Attention" language with the housekeeping default + escalation flag pattern; cite the trio standard. A follow-up PR (the spec is converged, the runtime behaviour now contradicts it — the doc must match the code).

## Lessons

- **Two coexisting standards is one standard not yet generalized.** When a class of failure (silently-stopped trio) gets a careful design and a separate class (release-readiness eval) reinvents a worse version of it, that's not two design problems — that's the trio standard wanting to be extracted into a primitive. Do the primitive.
- **"Fail-loud" is not a synonym for "Telegram topic."** Loud means observable and surfaced where the next operator looks. For internal-plumbing failures, that's `logs/sentinel-events.jsonl` and `server.log`. For user-actionable failures, it's the Attention queue. The spec should classify each emit-site explicitly.
- **Dogfood-to-ship works but is the last line of defense.** Catches at design time are cheaper than catches at dogfood time. Conformance checks are how we move catches earlier without slowing review.
- **A bad analogy in a spec writes itself into every implementation.** "A silent catch is forbidden" was true but framed the choice wrongly. Better: "Every failure must be audited; user-facing emission is a separate decision." Words matter; choose them so they don't preclude the right answer.
1 change: 1 addition & 0 deletions src/commands/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8147,6 +8147,7 @@ export async function startServer(options: StartOptions): Promise<void> {
backlogAgeDaysHigh: rrCfg.backlogAgeDaysHigh,
hysteresisHours: rrCfg.hysteresisHours,
staleEpisodeTtlDays: rrCfg.staleEpisodeTtlDays,
escalateEvalFailures: rrCfg.escalateEvalFailures,
});
console.log(pc.green(' ReleaseReadinessSentinel enabled (release-hygiene watchdog — job-driven)'));
} else {
Expand Down
6 changes: 6 additions & 0 deletions src/config/ConfigDefaults.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ const SHARED_DEFAULTS: Record<string, unknown> = {
hysteresisHours: 12,
staleEpisodeTtlDays: 30,
fetchTimeoutMs: 30_000,
// Evaluator-self-failures (fetch / analyzer / top-level tick) are
// HOUSEKEEPING by default — they write to the audit log + server.log
// but do not post a per-stage Attention item / Telegram topic. The
// user-actionable "release blocked" signal is unaffected. Set true to
// surface catastrophic watchdog failures in chat. Sentinel-trio standard.
escalateEvalFailures: false,
},
// Master gate for Telegram delivery of silently-stopped-sentinel
// escalations. Default false → sentinel notices are housekeeping and stay
Expand Down
72 changes: 72 additions & 0 deletions src/core/PostUpdateMigrator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,82 @@ export class PostUpdateMigrator {
this.migrateBootWrapperAbiCheck(result);
this.migrateStaleLifelineSignal(result);
this.migrateThreadlineConversationStore(result);
this.migrateRetireStaleReleaseReadinessEvalFailureAttention(result);

return result;
}

/**
* Retire stale `release-readiness-eval-failure-*` attention items left behind
* by the pre-housekeeping watchdog. From v1.3.43 down, ReleaseReadinessSentinel
* posted an Attention item — and therefore a new Telegram topic — every time
* the watchdog's own fetch / analyzer / tick stage broke. That violated the
* sentinel-trio standard (post-2026-05-22 topic-spam fix): internal-plumbing
* failures are housekeeping and belong in logs/sentinel-events.jsonl +
* server.log, not on the user's Telegram surface.
*
* The code-level fix demotes those emissions to audit-only (gated behind
* `monitoring.releaseReadiness.escalateEvalFailures`, default false). This
* migration cleans up the stragglers already on-disk so the topics don't
* keep haunting the topic list after update.
*
* Behaviour:
* - Reads .instar/state/attention-items.json. If absent, skip.
* - For every item whose id starts with `release-readiness-eval-failure-`:
* drop it from the items array. (The Telegram topic itself is left as-is;
* it was either /done'd by the user already, or will be unreferenced. We
* don't synchronously call Telegram from PostUpdateMigrator — the
* adapter isn't constructed at this point in startup.)
* - Atomic write (tmp + rename) so a crash mid-migration can't corrupt
* attention-items.json.
* - Idempotent: a second run finds zero matches and no-ops.
*
* Origin: 2026-05-27 dogfood feedback on Echo — repeated
* "Release-readiness check could not evaluate" topics violating the user's
* "no topic clutter for housekeeping" standard.
*/
private migrateRetireStaleReleaseReadinessEvalFailureAttention(result: MigrationResult): void {
const attentionPath = path.join(this.config.stateDir, 'state', 'attention-items.json');
if (!fs.existsSync(attentionPath)) {
result.skipped.push('retire-stale-release-readiness-eval-failure-attention: no attention-items.json');
return;
}

let parsed: { items?: Array<{ id?: string }> };
try {
parsed = JSON.parse(fs.readFileSync(attentionPath, 'utf-8')) as { items?: Array<{ id?: string }> };
} catch (err) {
result.errors.push(`retire-stale-release-readiness-eval-failure-attention read: ${err instanceof Error ? err.message : String(err)}`);
return;
}

if (!Array.isArray(parsed.items) || parsed.items.length === 0) {
result.skipped.push('retire-stale-release-readiness-eval-failure-attention: empty attention items');
return;
}

const before = parsed.items.length;
const filtered = parsed.items.filter((it) => {
const id = typeof it?.id === 'string' ? it.id : '';
return !id.startsWith('release-readiness-eval-failure-');
});
const dropped = before - filtered.length;
if (dropped === 0) {
result.skipped.push('retire-stale-release-readiness-eval-failure-attention: none on disk');
return;
}

parsed.items = filtered;
try {
const tmpPath = `${attentionPath}.${process.pid}.tmp`;
fs.writeFileSync(tmpPath, JSON.stringify(parsed, null, 2));
fs.renameSync(tmpPath, attentionPath);
result.upgraded.push(`retire-stale-release-readiness-eval-failure-attention: dropped ${dropped} stale item(s)`);
} catch (err) {
result.errors.push(`retire-stale-release-readiness-eval-failure-attention write: ${err instanceof Error ? err.message : String(err)}`);
}
}

/**
* Regenerate the boot wrapper when it predates the ABI-aware node
* self-heal (recurring-SQLite-bane fix).
Expand Down
11 changes: 11 additions & 0 deletions src/core/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2769,6 +2769,17 @@ export interface MonitoringConfig {
canonicalRemote?: string;
/** Override the instar repo path to analyze (default: the agent home). */
repoPath?: string;
/**
* When true, evaluator-self-failures (fetch / analyzer / top-level tick
* stages of the watchdog itself) post a LOW-priority Attention item — and
* therefore a Telegram topic — in addition to the audit-log entry. Default
* false: per the sentinel-trio standard ("Sentinel Notifications" in the
* agent CLAUDE.md, post-2026-05-22 topic-spam fix), internal-plumbing
* failures are housekeeping and stay in logs/sentinel-events.jsonl +
* server.log. The user-actionable "release blocked — unreleased work
* piling up" signal always posts regardless of this flag. Flip on only if
* you also want catastrophic-failure surfacing in chat. */
escalateEvalFailures?: boolean;
};
/**
* Master gate for Telegram delivery of silently-stopped-sentinel escalations
Expand Down
52 changes: 41 additions & 11 deletions src/monitoring/ReleaseReadinessSentinel.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,17 @@
* item per stall episode above it, keyed on the OLDEST unreleased commit
* SHA (stable across ticks — not a resettable per-tick id), priority scaled
* by backlog age, 12h hysteresis on re-raise after an auto-resolve.
* - Fail-loud: any evaluation failure (fetch error, analyzer error) raises a
* low-priority Attention item — never a silent catch (that would re-create
* the exact bug this fixes).
* - Fail-loud: any evaluation failure (fetch error, analyzer error, top-level
* tick error) writes a structured audit entry (sentinel-events.jsonl) and a
* dedup-keyed `eval-failed` emit — never a silent catch (that would re-create
* the exact bug this fixes). User-facing Telegram escalation of these
* evaluator-self-failures is HOUSEKEEPING by default, gated behind
* `escalateEvalFailures` (config: `monitoring.releaseReadiness.escalateEvalFailures`,
* default false), per the sentinel-trio standard ("Sentinel Notifications"
* in CLAUDE.md, post-2026-05-22 topic-spam fix). The audit log + server.log
* are the canonical observability surface; only the user-actionable
* "release blocked / unreleased work piling up" signal posts to Attention
* by default — that one is genuinely actionable.
* - Lifecycle owner: detect → surface → auto-resolve → reap, with
* resolveEpisodesInRange consulted by the publish-finalize path.
* - Repo-gated: needs an analyzable instar git repo (dev/maintainer env). On
Expand Down Expand Up @@ -120,6 +128,16 @@ export interface ReleaseReadinessSentinelConfig {
backlogAgeDaysHigh?: number;
hysteresisHours?: number;
staleEpisodeTtlDays?: number;
/**
* When true, evaluator-self-failures (fetch / analyzer / top-level tick stages)
* post a low-priority Attention item in addition to the audit log. Default
* false: housekeeping per the sentinel-trio standard — the audit log
* (logs/sentinel-events.jsonl) + server.log are the canonical observability
* surface for internal-plumbing failures, so the user is not spammed with a
* Telegram topic per stage that breaks. The user-actionable "release blocked"
* signal is unaffected by this flag and always posts to Attention.
*/
escalateEvalFailures?: boolean;
}

const DEFAULTS: Required<ReleaseReadinessSentinelConfig> = {
Expand All @@ -131,6 +149,7 @@ const DEFAULTS: Required<ReleaseReadinessSentinelConfig> = {
backlogAgeDaysHigh: 7,
hysteresisHours: 12,
staleEpisodeTtlDays: 30,
escalateEvalFailures: false,
};

const DAY_MS = 24 * 60 * 60 * 1000;
Expand Down Expand Up @@ -346,17 +365,28 @@ export class ReleaseReadinessSentinel extends EventEmitter {

private async failLoud(state: ReadinessState, stage: string, err: unknown): Promise<void> {
const key = `failure:${stage}`;
// Always audit — the audit log is the canonical observability surface for
// evaluator-self-failures. Both the dedup-suppressed and the un-suppressed
// paths produce an audit line so frequency is countable from disk.
this.deps.audit({ kind: 'release-readiness', event: 'eval-failed', stage, error: String(err) });
if (state.lastFailureKey === key) return; // dedupe per failure episode
state.lastFailureKey = key;
await this.deps.postAttention({
id: `release-readiness-eval-failure-${stage}`,
title: 'Release-readiness check could not evaluate',
summary: `The release-readiness check failed at the "${stage}" stage: ${String(err)}. Last evaluated ${state.lastSignalAt ? new Date(state.lastSignalAt).toISOString() : 'never'}.`,
category: 'degradation',
priority: 'LOW',
});
state.lastSignalAt = this.deps.now();
// HOUSEKEEPING by default: do NOT post a per-stage Attention item (which
// would create a per-event Telegram topic — the exact anti-pattern banned
// by the sentinel-trio standard post-2026-05-22 topic-spam fix). The user
// hears about this kind of failure only when escalateEvalFailures is
// explicitly enabled. The audit emission above + the `eval-failed` event
// remain the supported observability handles.
if (this.cfg.escalateEvalFailures) {
await this.deps.postAttention({
id: `release-readiness-eval-failure-${stage}`,
title: 'Release-readiness check could not evaluate',
summary: `The release-readiness check failed at the "${stage}" stage: ${String(err)}. Last evaluated ${state.lastSignalAt ? new Date(state.lastSignalAt).toISOString() : 'never'}.`,
category: 'degradation',
priority: 'LOW',
});
state.lastSignalAt = this.deps.now();
}
this.emit('eval-failed', { stage });
}

Expand Down
Loading
Loading