fix(release-readiness): demote eval-failure to housekeeping default (sentinel-trio standard)#459
Open
JKHeadley wants to merge 1 commit into
Open
fix(release-readiness): demote eval-failure to housekeeping default (sentinel-trio standard)#459JKHeadley wants to merge 1 commit into
JKHeadley wants to merge 1 commit into
Conversation
…sentinel-trio standard)
ReleaseReadinessSentinel.failLoud() posted a per-stage Attention item — and
therefore a per-stage Telegram topic — every time the watchdog's own fetch /
analyzer / tick stage broke ("Release-readiness check could not evaluate" with
an inscrutable body like "analyze-release returned no report"). That violates
the sentinel-trio standard from the 2026-05-22 topic-spam fix: internal-
plumbing failures are housekeeping and belong in logs/sentinel-events.jsonl
+ server.log, not on the user's Telegram surface.
- failLoud() audits unconditionally (canonical observability for housekeeping
is the audit log) but only postAttention()s when monitoring.releaseReadiness
.escalateEvalFailures is explicitly true. Default false. The user-actionable
"Release blocked — unreleased work piling up" signal is unaffected — it
always posts.
- migrateRetireStaleReleaseReadinessEvalFailureAttention strips stale
release-readiness-eval-failure-* rows from existing agents' attention-items
.json on next update. Atomic, idempotent, preserves legitimate rows.
- 6 new ReleaseReadinessSentinel tests cover both halves of the new contract
(housekeeping default + escalate opt-in) plus a regression guard that the
legitimate "release blocked" signal still posts under default config.
- 7 new PostUpdateMigrator tests cover the cleanup migration (missing file,
empty array, no-match no-op, selective drop, idempotency, malformed-entry
tolerance, unparseable-JSON error reporting).
- Post-mortem doc captures the root cause (spec framed loud-attention vs
silent-catch as binary, missing the housekeeping path) and tracks
follow-ups (sentinel-emit-site lint, SentinelEmitter primitive, spec
template "failure-mode emit-site table", /spec-converge cross-reference
rule, spec text correction).
Origin: 2026-05-27 dogfood feedback on Echo — two "Release-readiness check
could not evaluate" topics surfaced within 90 minutes (one fetch-stage, one
analyzer-stage). Live cleanup of those two rows was performed via DELETE
/attention before this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReleaseReadinessSentinel.failLoud()evaluator-self-failures (fetch / analyzer / top-level tick stages) to audit-only by default, gated behind a newmonitoring.releaseReadiness.escalateEvalFailuresopt-in. Brings the sentinel into compliance with the silently-stopped-trio standard (post-2026-05-22 topic-spam fix): internal-plumbing failures belong inlogs/sentinel-events.jsonl+server.log, not on the user's Telegram surface. The user-actionable "Release blocked — unreleased work is piling up" signal is unaffected — it still posts to Attention.migrateRetireStaleReleaseReadinessEvalFailureAttentioninPostUpdateMigratorto strip stalerelease-readiness-eval-failure-*rows from existing agents'.instar/state/attention-items.jsonon update. Atomic write + idempotent + preserves legitimate user-actionable rows.docs/postmortems/2026-05-27-release-readiness-eval-failure-topics.md) tracing the spec-time framing error ("loud-attention vs silent-catch" as a binary, missing the housekeeping path) and listing the structural follow-ups (sentinel-emit-site lint,SentinelEmitterprimitive, spec-template "failure-mode emit-site table",/spec-convergecross-reference rule, spec text correction PR).Origin: dogfood feedback on Echo (2026-05-27). User observed two "Release-readiness check could not evaluate" topics within 90 minutes (one fetch-stage, one analyzer-stage) with bodies that were inscrutable ("analyze-release returned no report") and called it out as anti-Instar topic clutter. Echo's two stale rows were live-cleaned via
DELETE /attentionbefore this PR.Test plan
tests/unit/ReleaseReadinessSentinel.test.ts— 16 tests including the 6 new fail-loud cases (housekeeping default + escalate opt-in + user-actionable regression guard)tests/unit/PostUpdateMigrator-retireStaleReleaseReadinessEvalFailureAttention.test.ts— 7 teststests/unit/releaseReadinessWiring.test.ts— 8 tests pass unchangedtests/integration/release-readiness-routes.test.ts— 5 tests pass unchangedtests/e2e/release-readiness-live.test.ts— 2 tests pass unchangedtsc --noEmitclean🤖 Generated with Claude Code