From c897755fe8ab2d8161bb2ceb703f812e276dd96a Mon Sep 17 00:00:00 2001 From: "Instar Agent (echo)" Date: Sat, 30 May 2026 20:52:33 -0700 Subject: [PATCH] feat(agent-sleep): SleepController decision foundation (Stage B slice 1, dark + dry-run) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The deepest lever of the Responsible Resource Usage standard — letting a deeply idle agent drop its server to near-zero footprint and wake on the next message. The mechanism (supervisor stop + lifeline respawn) is risky, so this slice ships the SAFE half first: the decision + every safety guard, dark + dry-run, so the 'is it safe to sleep?' reasoning is proven + observable before anything stops. - SleepController: pure evaluateSleep (awake / idle-shallow / keep-awake / would-sleep) with guards — held multi-machine lease, in-flight work, imminent scheduled job. Thin ticking class audits transitions to logs/agent-sleep-events.jsonl; dry-run never acts. - AgentActivityState: shared idle signal, bumped at the inbound chokepoint (/internal/telegram-forward) so a genuinely-messaged agent never sleeps. - Wired into the server (off + dry-run), GET /sleep exposes the live verdict, config monitoring.agentSleep, CapabilityIndex classification. 3-tier tests: unit (both sides of every guard boundary, exact thresholds, dry-run-never-acts, transition-only audit) + integration (GET /sleep 503-unwired / 200-alive with verdict). Signal-only, no blocking authority, never stops a process. Self-approved under the deploy mandate (Justin: build Stage B now, topic 16782). Co-Authored-By: Claude Opus 4.8 --- .../agent-hard-sleep-controller.eli16.md | 55 ++++ docs/specs/agent-hard-sleep-controller.md | 81 ++++++ src/commands/server.ts | 54 +++- src/config/ConfigDefaults.ts | 14 + src/core/types.ts | 15 ++ src/monitoring/AgentActivityState.ts | 35 +++ src/monitoring/SleepController.ts | 252 ++++++++++++++++++ src/server/AgentServer.ts | 6 + src/server/CapabilityIndex.ts | 1 + src/server/routes.ts | 22 ++ .../sleep-controller-routes.test.ts | 90 +++++++ tests/unit/SleepController.test.ts | 191 +++++++++++++ upgrades/NEXT.md | 52 ++++ .../agent-hard-sleep-controller.md | 101 +++++++ 14 files changed, 968 insertions(+), 1 deletion(-) create mode 100644 docs/specs/agent-hard-sleep-controller.eli16.md create mode 100644 docs/specs/agent-hard-sleep-controller.md create mode 100644 src/monitoring/AgentActivityState.ts create mode 100644 src/monitoring/SleepController.ts create mode 100644 tests/integration/sleep-controller-routes.test.ts create mode 100644 tests/unit/SleepController.test.ts create mode 100644 upgrades/NEXT.md create mode 100644 upgrades/side-effects/agent-hard-sleep-controller.md diff --git a/docs/specs/agent-hard-sleep-controller.eli16.md b/docs/specs/agent-hard-sleep-controller.eli16.md new file mode 100644 index 000000000..8a2519806 --- /dev/null +++ b/docs/specs/agent-hard-sleep-controller.eli16.md @@ -0,0 +1,55 @@ +# ELI16 — Teaching an idle agent when it's safe to "go to sleep" + +## What this is, in plain English + +Every instar agent runs a full background server all the time — even when nobody +has talked to it for hours. On a machine running ~9 of them, that idle cost is the +biggest drain on the laptop. The end goal (Stage B of the agent-sleep design) is: +when an agent has been completely idle for a while, it drops almost everything to +near-zero and instantly wakes back up the moment a message arrives — like a laptop +sleeping and waking. + +That's a risky thing to build, because if an agent sleeps at the wrong moment it +could miss a message or get stuck. So this change builds the SAFE HALF first: the +part that decides *"is it actually safe to sleep right now?"* — and nothing else. +It watches, it decides, it writes down what it would have done — but it never +actually stops anything yet. + +## How it decides + +It answers with one of four words: + +- **awake** — a work session is running, or someone was active in the last couple + of minutes. +- **idle-shallow** — quiet, but not quiet long enough yet. +- **keep-awake** — quiet long enough to consider sleeping, BUT a safety guard says + no. +- **would-sleep** — quiet long enough AND every safety guard is clear. + +The safety guards are the important part. It will NOT say "would-sleep" if: + +- this machine is the one currently in charge of answering messages (in a + multi-machine setup, it must hand that off first), or +- there's work in flight (a message being handled, a recovery running), or +- a scheduled job is about to fire in the next couple of minutes. + +Each guard names itself in the reason, so when you ask "why is this agent still +awake?" you get a plain answer like "holds the multi-machine serving lease." + +## Why this is safe to ship right now + +It ships **off by default**, and even when turned on it runs in **dry-run** — it +only writes its decision to a log file (`agent-sleep-events.jsonl`) and serves it at +a `/sleep` status check. It has no power to stop a server. The whole point of +shipping it dark first is to watch real agents for a while and confirm: does a real +idle agent actually reach "would-sleep," and was every "keep-awake" correct? Only +once that's proven does the next slice wire the part that actually stops and wakes +the server. + +## What you need to decide + +Nothing risky. This is the foundation slice of the Stage B you asked me to build +now. It can't break anything because it never acts — it just makes the sleep +decision visible and testable. If it's ever wrong, you'd see it in the log without +any agent ever having slept. The next slice is the actual stop-and-wake mechanism, +and it'll only get built on top of a decision layer we've watched behave correctly. diff --git a/docs/specs/agent-hard-sleep-controller.md b/docs/specs/agent-hard-sleep-controller.md new file mode 100644 index 000000000..f3bd9d96f --- /dev/null +++ b/docs/specs/agent-hard-sleep-controller.md @@ -0,0 +1,81 @@ +--- +title: Agent hard-sleep — SleepController decision foundation (Stage B, slice 1) +slug: agent-hard-sleep-controller +status: approved +review-convergence: 2026-05-31T03:45:00+00:00 +approved: true +author: echo +approval-note: > + Self-approved by Echo under the delegated deploy mandate. Justin directed + (topic 16782, 2026-05-31) to build Stage B agent-sleep now, in-session, and not + defer it. This is the first slice: the sleep DECISION logic + every safety + guard, shipped dark + dry-run, so the "is it safe to sleep?" reasoning is proven + and observable BEFORE the mechanism slice wires the mechanism that actually stops the + server. Umbrella design: docs/specs/agent-sleep-mode.md (PR #594). +--- + +# Agent hard-sleep — SleepController decision foundation + +## Problem + +Stage B of the agent-sleep design (the deepest lever of the Responsible Resource +Usage standard) lets a deeply-idle agent drop its server to near-zero footprint and +wake on the next message. The risky part is the MECHANISM: the supervisor stopping +the server and the lifeline respawning it without losing a message. Before any of +that is wired, the DECISION — "is it actually safe for this agent to sleep right +now?" — must be correct and observable, because a wrong decision (sleeping while it +holds the multi-machine lease, or while a job is about to fire, or while work is in +flight) is how hard-sleep would brick an agent. + +## What's new + +`src/monitoring/SleepController.ts` — a pure, exhaustively-testable decision module: + +- **`evaluateSleep(input, thresholds)`** returns one of four verdicts: + - `awake` — a session is running, or activity within `idleGraceMs`. + - `idle-shallow` — idle past grace but before `deepIdleMs`. + - `keep-awake` — deep-idle but a **safety guard** blocks sleep. + - `would-sleep` — deep-idle and every guard clear. +- **Safety guards** (any one ⇒ `keep-awake`, named in the reason): this machine + holds the multi-machine serving lease; in-flight work (forward / recovery / + queued message); a scheduled job fires within `wakeLeadMs`. +- **`SleepController`** ticks the decision on a cadence. It audits only on a + decision TRANSITION (low-noise, like the reaper audit) to + `logs/agent-sleep-events.jsonl`. In **dry-run (the default)** it never acts. In + live mode (`enabled && !dryRun`, the mechanism slice wires the consumer) it calls + `requestSleep` once per would-sleep episode. + +Config (`monitoring.agentSleep`, default OFF + dry-run, mirrors the reaper): +`{ enabled: false, dryRun: true, tickIntervalSec, idleGraceMs, deepIdleMs, wakeLeadMs }`. +Status route `GET /sleep` exposes the latest verdict + thresholds for inspection. + +## What is explicitly NOT in this slice + +The mechanism: the supervisor consuming a sleep-request to stop the server, the +lifeline writing a wake-request + respawning + replaying the buffered message, and +the watchdog treating a slept agent as healthy. Those are the next slice; this one +ships the decision + guards dark so they can be validated against real agent +behavior first (does a real agent ever reach `would-sleep`, and was every +`keep-awake` correct?). + +## Safeguards + +- Default OFF + dry-run: the controller only observes; nothing stops a server. +- Every guard defaults to the SAFE side: unknown lease/in-flight/job state is + sampled conservatively (treated as a reason to stay awake) so a sampling gap can + never produce a spurious would-sleep in live mode. +- Signal-only in this slice — no blocking authority over any message. + +## Testing + +- Unit (`SleepController.test.ts`): both sides of every boundary (grace, deep-idle, + each guard), exact-threshold boundaries, most-recent-of-inbound-vs-activity, the + dry-run-never-acts contract, once-per-episode latching, and transition-only audit. +- Integration: `GET /sleep` returns 200 with the current verdict when enabled; + 503-stub semantics consistent with the other dark monitors when disabled. + +## Rollback + +Pure additive source + a default-off config block (auto-migrated, existence-checked). +Revert the commit → the controller and route disappear; nothing else changes. No +persistent state beyond the best-effort audit log. diff --git a/src/commands/server.ts b/src/commands/server.ts index ffb492017..42cba2232 100644 --- a/src/commands/server.ts +++ b/src/commands/server.ts @@ -8897,6 +8897,58 @@ export async function startServer(options: StartOptions): Promise { )); } + // ── Agent hard-sleep — SleepController (RESPONSIBLE-RESOURCE-USAGE, Stage B) ── + // Decides "is it safe for this idle agent to drop to near-zero footprint?" with + // every safety guard. Ships OFF + dry-run: observes + audits to + // logs/agent-sleep-events.jsonl, never stops a server. The mechanism + // (supervisor stop + lifeline respawn) is a later slice. GET /sleep exposes the + // live verdict. The shared idle signal (AgentActivityState) is bumped at the + // inbound-message chokepoint (/internal/telegram-forward). + const { AgentActivityState } = await import('../monitoring/AgentActivityState.js'); + const agentActivityState = new AgentActivityState(); + const { SleepController, sleepAuditSink } = await import('../monitoring/SleepController.js'); + const _sleepCfg = config.monitoring?.agentSleep; + const sleepController = new SleepController( + { + sample: () => { + const act = agentActivityState.snapshot(); + return { + now: Date.now(), + runningSessions: sessionManager.listRunningSessions().length, + lastInboundAt: act.lastInboundAt, + lastActivityAt: act.lastActivityAt, + // Lease guard: only relevant when multi-machine coordination is active. + leaseActive: coordinator.enabled, + holdsLease: coordinator.enabled ? coordinator.holdsLease() : false, + // In-flight: an inbound message currently being handled. (The relay/forward + // in-flight + scheduler-wake signals are wired with the stop mechanism in + // the next slice — this slice is dry-run, so it never acts on them.) + inflightWork: (currentInboundByTopic?.size ?? 0) > 0, + nextScheduledJobAt: null, + }; + }, + audit: sleepAuditSink(config.stateDir), + }, + { + enabled: _sleepCfg?.enabled ?? false, + dryRun: _sleepCfg?.dryRun ?? true, + tickIntervalMs: (_sleepCfg?.tickIntervalSec ?? 60) * 1000, + thresholds: { + idleGraceMs: _sleepCfg?.idleGraceMs ?? 120_000, + deepIdleMs: _sleepCfg?.deepIdleMs ?? 900_000, + wakeLeadMs: _sleepCfg?.wakeLeadMs ?? 120_000, + }, + }, + ); + sleepController.start(); + if (_sleepCfg?.enabled) { + console.log(pc.green( + _sleepCfg.dryRun === false + ? ' SleepController enabled (agent hard-sleep — LIVE decision)' + : ' SleepController enabled (agent hard-sleep — dry-run, observe only)', + )); + } + // ── Unkillability backstop (UNIFIED-SESSION-LIFECYCLE §P5) ─────────────── // Signal-only: raises ONE deduped Attention item (never auto-kills) when a // session is KEPT forever despite faking work, or is stuck indeterminate. @@ -9485,7 +9537,7 @@ export async function startServer(options: StartOptions): Promise { console.log(pc.dim(` [session-pool] rollout gate not wired: ${err instanceof Error ? err.message : String(err)}`)); } - const server = new AgentServer({ config, sessionManager, state, scheduler, telegram, relationships, feedback, feedbackAnomalyDetector, dispatches, updateChecker, autoUpdater, autoDispatcher, quotaTracker, quotaManager, publisher, viewer, tunnel, evolution, watchdog, topicMemory, triageNurse, projectMapper, coherenceGate: scopeVerifier, contextHierarchy, canonicalState, operationGate, sentinel, adaptiveTrust, memoryMonitor, orphanReaper, coherenceMonitor, commitmentTracker, semanticMemory, activitySentinel, rateLimitSentinel, releaseReadinessSentinel: releaseReadinessSentinel ?? undefined, messageRouter, summarySentinel, spawnManager, systemReviewer, capabilityMapper, selfKnowledgeTree, coverageAuditor, topicResumeMap: _topicResumeMap ?? undefined, sessionRefresh: _sessionRefresh ?? undefined, autonomyManager, trustElevationTracker, autonomousEvolution, coordinator: coordinator.enabled ? coordinator : undefined, localSigningKeyPem, leaseTransport, liveTailReceiver, handoffWireTransport, onHandoffBegin, onHandoffInitiate: handoffInitiate, handoffInProgress: handoffSentinelInProgress, messageLedger, currentInboundByTopic, replyMarkerTransport, onReplyMarker: messageLedger ? (marker: unknown) => { const m = marker as { dedupeKey: string; platform: string; replyIdempotencyKey: string; epoch: number; topic?: string | null }; messageLedger!.applyRemoteReplyMarker(m.dedupeKey, { platform: m.platform, replyIdempotencyKey: m.replyIdempotencyKey, epoch: m.epoch, topic: m.topic ?? null }); } : undefined, whatsapp: whatsappAdapter, slack: slackAdapter, imessage: imessageAdapter, whatsappBusinessBackend, messageBridge, hookEventReceiver, worktreeMonitor, subagentTracker, instructionsVerifier, handshakeManager: threadlineHandshake, threadlineRouter, conversationStore, warrantsReplyGate, collaborationSurfacer, threadResumeMap, topicLinkageHandler: topicLinkageHandler ?? undefined, threadlineRelayClient, threadlineReplyWaiters, listenerManager: listenerManager ?? undefined, responseReviewGate, messagingToneGate, outboundDedupGate, telemetryHeartbeat, pasteManager, featureRegistry, discoveryEvaluator, completionEvaluator, unifiedTrust, liveConfig, sharedStateLedger, ledgerSessionRegistry, worktreeManager, oidcEnrolledRepos: parallelDevConfig?.oidcEnrolledRepos, initiativeTracker, projectRoundRunner, projectDriftChecker, machineHeartbeat, machinePoolRegistry, meshRpcDispatcher, sessionOwnershipRegistry, sessionPoolE2EResultStore, proxyCoordinator, topicIntentStore, topicIntentArcCheck, usherSignalStore, intelligence: sharedIntelligence ?? undefined, telegramBridgeConfig, telegramBridge: telegramBridge ?? undefined, threadlineObservability, briefDeps, workingMemory, taskFlowRegistry, threadlineFlowBridge, sessionReaper, agentWorktreeReaper, reapLog, sleepWakeDetector, unjustifiedStopGate, stopGateDb, stopNotifier }); + const server = new AgentServer({ config, sessionManager, state, scheduler, telegram, relationships, feedback, feedbackAnomalyDetector, dispatches, updateChecker, autoUpdater, autoDispatcher, quotaTracker, quotaManager, publisher, viewer, tunnel, evolution, watchdog, topicMemory, triageNurse, projectMapper, coherenceGate: scopeVerifier, contextHierarchy, canonicalState, operationGate, sentinel, adaptiveTrust, memoryMonitor, orphanReaper, coherenceMonitor, commitmentTracker, semanticMemory, activitySentinel, rateLimitSentinel, releaseReadinessSentinel: releaseReadinessSentinel ?? undefined, messageRouter, summarySentinel, spawnManager, systemReviewer, capabilityMapper, selfKnowledgeTree, coverageAuditor, topicResumeMap: _topicResumeMap ?? undefined, sessionRefresh: _sessionRefresh ?? undefined, autonomyManager, trustElevationTracker, autonomousEvolution, coordinator: coordinator.enabled ? coordinator : undefined, localSigningKeyPem, leaseTransport, liveTailReceiver, handoffWireTransport, onHandoffBegin, onHandoffInitiate: handoffInitiate, handoffInProgress: handoffSentinelInProgress, messageLedger, currentInboundByTopic, replyMarkerTransport, onReplyMarker: messageLedger ? (marker: unknown) => { const m = marker as { dedupeKey: string; platform: string; replyIdempotencyKey: string; epoch: number; topic?: string | null }; messageLedger!.applyRemoteReplyMarker(m.dedupeKey, { platform: m.platform, replyIdempotencyKey: m.replyIdempotencyKey, epoch: m.epoch, topic: m.topic ?? null }); } : undefined, whatsapp: whatsappAdapter, slack: slackAdapter, imessage: imessageAdapter, whatsappBusinessBackend, messageBridge, hookEventReceiver, worktreeMonitor, subagentTracker, instructionsVerifier, handshakeManager: threadlineHandshake, threadlineRouter, conversationStore, warrantsReplyGate, collaborationSurfacer, threadResumeMap, topicLinkageHandler: topicLinkageHandler ?? undefined, threadlineRelayClient, threadlineReplyWaiters, listenerManager: listenerManager ?? undefined, responseReviewGate, messagingToneGate, outboundDedupGate, telemetryHeartbeat, pasteManager, featureRegistry, discoveryEvaluator, completionEvaluator, unifiedTrust, liveConfig, sharedStateLedger, ledgerSessionRegistry, worktreeManager, oidcEnrolledRepos: parallelDevConfig?.oidcEnrolledRepos, initiativeTracker, projectRoundRunner, projectDriftChecker, machineHeartbeat, machinePoolRegistry, meshRpcDispatcher, sessionOwnershipRegistry, sessionPoolE2EResultStore, proxyCoordinator, topicIntentStore, topicIntentArcCheck, usherSignalStore, intelligence: sharedIntelligence ?? undefined, telegramBridgeConfig, telegramBridge: telegramBridge ?? undefined, threadlineObservability, briefDeps, workingMemory, taskFlowRegistry, threadlineFlowBridge, sessionReaper, agentWorktreeReaper, sleepController, agentActivityState, reapLog, sleepWakeDetector, unjustifiedStopGate, stopGateDb, stopNotifier }); // Boot-recovery (tunnel-failure-resilience spec Part 6): if the agent // died mid-relay-episode, the persisted tunnel.json carries // rotationPending=true. Rotate the dashboard PIN + authToken BEFORE diff --git a/src/config/ConfigDefaults.ts b/src/config/ConfigDefaults.ts index e97a04735..dffe3452a 100644 --- a/src/config/ConfigDefaults.ts +++ b/src/config/ConfigDefaults.ts @@ -102,6 +102,20 @@ const SHARED_DEFAULTS: Record = { reapIntervalMs: 86_400_000, maxReapsPerPass: 20, }, + // Agent hard-sleep — SleepController decision foundation (Stage B, slice 1; + // docs/specs/agent-hard-sleep-controller.md). Decides "is it safe for this + // idle agent to drop its server to near-zero footprint?" with every safety + // guard (held lease / in-flight work / imminent scheduled job). Ships OFF + + // dry-run: observes + audits to logs/agent-sleep-events.jsonl, never stops a + // server. The mechanism (supervisor stop + lifeline respawn) is a later slice. + agentSleep: { + enabled: false, + dryRun: true, + tickIntervalSec: 60, + idleGraceMs: 120_000, + deepIdleMs: 900_000, + wakeLeadMs: 120_000, + }, // Unkillability backstop (UNIFIED-SESSION-LIFECYCLE §P5). Default ON, signal- // only: raises ONE deduped Attention item (never auto-kills) when a session is // KEPT forever despite faking work, or is stuck indeterminate. The escalation diff --git a/src/core/types.ts b/src/core/types.ts index 8fbba809f..d06d30ad1 100644 --- a/src/core/types.ts +++ b/src/core/types.ts @@ -3092,6 +3092,21 @@ export interface MonitoringConfig { reapIntervalMs?: number; maxReapsPerPass?: number; }; + /** + * Agent hard-sleep — SleepController decision foundation (RESPONSIBLE-RESOURCE- + * USAGE, Stage B; docs/specs/agent-hard-sleep-controller.md). Decides whether a + * deeply-idle agent may drop its server to near-zero footprint, with safety + * guards (held lease / in-flight / imminent job). Ships OFF + dry-run: observes + * + audits, never stops a server. GET /sleep exposes the live verdict. + */ + agentSleep?: { + enabled?: boolean; + dryRun?: boolean; + tickIntervalSec?: number; + idleGraceMs?: number; + deepIdleMs?: number; + wakeLeadMs?: number; + }; /** * Unkillability backstop (UNIFIED-SESSION-LIFECYCLE §P5). Watches for sessions * the conservative KEEP-rules would protect forever — one that FAKES work, or diff --git a/src/monitoring/AgentActivityState.ts b/src/monitoring/AgentActivityState.ts new file mode 100644 index 000000000..a4ac35586 --- /dev/null +++ b/src/monitoring/AgentActivityState.ts @@ -0,0 +1,35 @@ +/** + * AgentActivityState — the single shared "when was this agent last active?" signal + * (agent-sleep design, docs/specs/agent-sleep-mode.md → "Define a single shared + * idle signal"). The SleepController samples it to decide deep-idle; the server + * bumps it at the inbound-message chokepoint and on session spawn. + * + * Deliberately tiny + in-memory: "activity" for sleep purposes is a real inbound + * message or a session starting — NOT internal health-check traffic (which must + * never keep an otherwise-idle agent awake). So the server bumps this only at + * genuine activity points, not on every HTTP request. + */ +export interface ActivitySnapshot { + lastInboundAt: number | null; + lastActivityAt: number | null; +} + +export class AgentActivityState { + private lastInboundAt: number | null = null; + private lastActivityAt: number | null = null; + + /** A genuine inbound user/agent message arrived. */ + markInbound(now: number): void { + this.lastInboundAt = now; + this.lastActivityAt = now; + } + + /** Non-message activity that should still defer sleep (e.g. a session spawn). */ + markActivity(now: number): void { + this.lastActivityAt = now; + } + + snapshot(): ActivitySnapshot { + return { lastInboundAt: this.lastInboundAt, lastActivityAt: this.lastActivityAt }; + } +} diff --git a/src/monitoring/SleepController.ts b/src/monitoring/SleepController.ts new file mode 100644 index 000000000..82120b0f6 --- /dev/null +++ b/src/monitoring/SleepController.ts @@ -0,0 +1,252 @@ +/** + * SleepController — the decision half of agent hard-sleep (Stage B of the + * Responsible Resource Usage / agent-sleep design, docs/specs/agent-sleep-mode.md). + * + * This module owns ONE question: "is it safe for this agent to drop its server to + * near-zero footprint right now?" It is deliberately split from the MECHANISM that + * actually stops/respawns the server (the supervisor + lifeline handshake, a later + * slice). Getting the decision — and every safety guard — correct and OBSERVABLE + * first, in dry-run, is what makes the mechanism safe to wire: the same dark + + * dry-run discipline the AgentWorktreeReaper shipped with. + * + * Pure `evaluateSleep()` is fake-free and exhaustively unit-testable. The thin + * `SleepController` class ticks it on a cadence and, in dry-run (the default), + * only records what it WOULD do to an audit sink — it never stops anything. + */ +import fs from 'node:fs'; +import path from 'node:path'; + +/** Live inputs sampled once per tick. All timestamps are epoch ms. */ +export interface SleepInput { + now: number; + /** Active Claude/codex sessions. Any > 0 ⇒ never sleep. */ + runningSessions: number; + /** Last inbound user/agent message timestamp, or null if never. */ + lastInboundAt: number | null; + /** Last any-activity timestamp (session output, outbound, tick work), or null. */ + lastActivityAt: number | null; + /** This machine currently holds the multi-machine serving lease. */ + holdsLease: boolean; + /** Multi-machine lease coordination is active at all (single-machine ⇒ false). */ + leaseActive: boolean; + /** Any in-flight forward, recovery, or queued/undelivered message. */ + inflightWork: boolean; + /** Next scheduled cron job fire time (epoch ms), or null if none scheduled. */ + nextScheduledJobAt: number | null; +} + +export interface SleepThresholds { + /** Time since last activity to count as "idle" at all. */ + idleGraceMs: number; + /** Continuous idle time before deep-idle ⇒ a sleep candidate. */ + deepIdleMs: number; + /** Don't sleep if a scheduled job fires within this lead window. */ + wakeLeadMs: number; +} + +export type SleepDecision = 'awake' | 'idle-shallow' | 'keep-awake' | 'would-sleep'; + +export interface SleepVerdict { + decision: SleepDecision; + reason: string; + /** Idle duration in ms at evaluation time (Infinity if never any signal). */ + idleForMs: number; +} + +export const DEFAULT_SLEEP_THRESHOLDS: SleepThresholds = { + idleGraceMs: 120_000, // 2 min + deepIdleMs: 900_000, // 15 min + wakeLeadMs: 120_000, // 2 min +}; + +/** + * Decide whether the agent may hard-sleep. Pure; no I/O. The guards are ordered + * so the returned reason names the FIRST blocking condition — easiest to read in + * an audit trail. + */ +export function evaluateSleep(input: SleepInput, t: SleepThresholds): SleepVerdict { + // 1. Active sessions ⇒ never sleep. (A session means real work in flight.) + if (input.runningSessions > 0) { + return { + decision: 'awake', + reason: `${input.runningSessions} running session(s)`, + idleForMs: 0, + }; + } + + // 2. Idle duration = time since the most recent inbound OR activity signal. + const lastSignal = Math.max(input.lastInboundAt ?? 0, input.lastActivityAt ?? 0); + const idleForMs = lastSignal > 0 ? input.now - lastSignal : Number.POSITIVE_INFINITY; + + if (idleForMs < t.idleGraceMs) { + return { decision: 'awake', reason: `recent activity ${secs(idleForMs)} ago`, idleForMs }; + } + if (idleForMs < t.deepIdleMs) { + return { + decision: 'idle-shallow', + reason: `idle ${secs(idleForMs)} (< deepIdle ${secs(t.deepIdleMs)})`, + idleForMs, + }; + } + + // 3. Deep-idle — every safety guard below blocks sleep (KEEP-awake on any). + if (input.leaseActive && input.holdsLease) { + return { + decision: 'keep-awake', + reason: 'holds the multi-machine serving lease — must hand off before sleeping', + idleForMs, + }; + } + if (input.inflightWork) { + return { decision: 'keep-awake', reason: 'in-flight work (forward / recovery / queued message)', idleForMs }; + } + if (input.nextScheduledJobAt !== null) { + const until = input.nextScheduledJobAt - input.now; + if (until <= t.wakeLeadMs) { + return { decision: 'keep-awake', reason: `scheduled job fires in ${secs(Math.max(0, until))} (< wakeLead)`, idleForMs }; + } + } + + // 4. Deep-idle and every guard clear ⇒ safe to sleep. + return { + decision: 'would-sleep', + reason: `deep-idle ${secs(idleForMs)}; no sessions, no held lease, no in-flight work, no imminent job`, + idleForMs, + }; +} + +function secs(ms: number): string { + if (!Number.isFinite(ms)) return '∞'; + return `${Math.round(ms / 1000)}s`; +} + +// ── Audit + handshake ──────────────────────────────────────────────── + +export interface SleepAuditEntry { + ts: string; + decision: SleepDecision; + reason: string; + idleForMs: number; + dryRun: boolean; +} + +export type SleepAuditSink = (entry: SleepAuditEntry) => void; + +/** + * Append-only JSONL audit sink at `logs/agent-sleep-events.jsonl`. Only writes on + * a decision CHANGE (transition), never every tick — the same low-noise pattern as + * the reaper audit, so a deep-idle agent doesn't spam the log every cadence. + */ +export function sleepAuditSink(stateDir: string): SleepAuditSink { + const file = path.join(stateDir, 'logs', 'agent-sleep-events.jsonl'); + return (entry: SleepAuditEntry) => { + try { + fs.mkdirSync(path.dirname(file), { recursive: true }); + fs.appendFileSync(file, JSON.stringify(entry) + '\n'); + } catch { + /* @silent-fallback-ok — audit is best-effort observability, never load-bearing */ + } + }; +} + +export interface SleepControllerOptions { + enabled: boolean; + /** When true (default), evaluate + audit but NEVER write the sleep-request flag. */ + dryRun: boolean; + thresholds?: Partial; +} + +export interface SleepControllerDeps { + sample: () => SleepInput; + audit?: SleepAuditSink; + /** Live-mode only (dryRun=false): request the supervisor to sleep the server. */ + requestSleep?: (verdict: SleepVerdict) => void; +} + +/** + * Ticks the sleep decision on a cadence. Records every TRANSITION to the audit + * sink; in live mode (dryRun=false) calls `requestSleep` on a fresh would-sleep. + * In dry-run it is pure observability — the foundation slice ships this way. + */ +export class SleepController { + private readonly thresholds: SleepThresholds; + private lastDecision: SleepDecision | null = null; + private lastVerdict: SleepVerdict | null = null; + private sleepRequested = false; + private timer: ReturnType | null = null; + + constructor( + private readonly deps: SleepControllerDeps, + private readonly opts: SleepControllerOptions & { tickIntervalMs?: number }, + ) { + this.thresholds = { ...DEFAULT_SLEEP_THRESHOLDS, ...(opts.thresholds ?? {}) }; + } + + /** Begin ticking on the configured cadence. No-op when not enabled (the audit + * still works in dry-run-but-enabled; a fully disabled controller never ticks). */ + start(): void { + if (this.timer || !this.opts.enabled) return; + const intervalMs = this.opts.tickIntervalMs ?? 60_000; + this.timer = setInterval(() => { + try { this.tick(); } catch { /* @silent-fallback-ok — observability tick, never load-bearing */ } + }, intervalMs); + if (typeof this.timer.unref === 'function') this.timer.unref(); + } + + stop(): void { + if (this.timer) { clearInterval(this.timer); this.timer = null; } + } + + /** Evaluate once. Returns the verdict (also used by tests + a status route). */ + tick(): SleepVerdict { + const verdict = evaluateSleep(this.deps.sample(), this.thresholds); + this.lastVerdict = verdict; + + if (verdict.decision !== this.lastDecision) { + this.lastDecision = verdict.decision; + this.deps.audit?.({ + ts: new Date().toISOString(), + decision: verdict.decision, + reason: verdict.reason, + idleForMs: verdict.idleForMs, + dryRun: this.opts.dryRun, + }); + } + + // Live mechanism (off in the foundation slice): request sleep once per + // would-sleep episode; reset the latch as soon as we leave would-sleep. + if (verdict.decision === 'would-sleep') { + if (!this.opts.dryRun && this.opts.enabled && !this.sleepRequested) { + this.sleepRequested = true; + this.deps.requestSleep?.(verdict); + } + } else { + this.sleepRequested = false; + } + + return verdict; + } + + /** Current latched state — for a status route / tests. */ + get state(): { lastDecision: SleepDecision | null; sleepRequested: boolean } { + return { lastDecision: this.lastDecision, sleepRequested: this.sleepRequested }; + } + + /** Read-only status for GET /sleep. Ticks once so the verdict is fresh. */ + snapshot(): { + enabled: boolean; + dryRun: boolean; + thresholds: SleepThresholds; + verdict: SleepVerdict; + sleepRequested: boolean; + } { + const verdict = this.tick(); + return { + enabled: this.opts.enabled, + dryRun: this.opts.dryRun, + thresholds: this.thresholds, + verdict, + sleepRequested: this.sleepRequested, + }; + } +} diff --git a/src/server/AgentServer.ts b/src/server/AgentServer.ts index ffda52ecf..67415830a 100644 --- a/src/server/AgentServer.ts +++ b/src/server/AgentServer.ts @@ -368,6 +368,10 @@ export class AgentServer { /** AgentWorktreeReaper — reclaims stale CLI worktrees. Powers * GET /worktrees/agent-reaper. */ agentWorktreeReaper?: import('../monitoring/AgentWorktreeReaper.js').AgentWorktreeReaper; + /** SleepController — agent hard-sleep decision (Stage B). Powers GET /sleep. */ + sleepController?: import('../monitoring/SleepController.js').SleepController; + /** AgentActivityState — shared idle signal bumped at the inbound chokepoint. */ + agentActivityState?: import('../monitoring/AgentActivityState.js').AgentActivityState; /** ReapLog — durable audit of every reap + skipped-reap (UNIFIED-SESSION-LIFECYCLE * §P4). Powers GET /sessions/reap-log. */ reapLog?: import('../monitoring/ReapLog.js').ReapLog; @@ -852,6 +856,8 @@ export class AgentServer { correctionLedger: this.correctionLedger, sessionReaper: options.sessionReaper ?? null, agentWorktreeReaper: options.agentWorktreeReaper ?? null, + sleepController: options.sleepController ?? null, + agentActivityState: options.agentActivityState ?? null, reapLog: options.reapLog ?? null, sleepWakeDetector: options.sleepWakeDetector ?? null, telegramBridgeConfig: options.telegramBridgeConfig ?? null, diff --git a/src/server/CapabilityIndex.ts b/src/server/CapabilityIndex.ts index ecf364772..439a45a6f 100644 --- a/src/server/CapabilityIndex.ts +++ b/src/server/CapabilityIndex.ts @@ -867,6 +867,7 @@ export const INTERNAL_PREFIXES: ReadonlyArray<{ prefix: string; reason: string } { prefix: 'build', reason: 'operator-only build endpoint' }, { prefix: 'sessions', reason: 'operator/dashboard-only session listing (no agent-facing API)' }, { prefix: 'worktrees', reason: 'AgentWorktreeReaper read-only report (reclaimable stale worktrees) — operational observability the agent READS, like /sessions/reap-log; not a user-invokable capability' }, + { prefix: 'sleep', reason: 'SleepController read-only verdict (agent hard-sleep decision + which guard holds it awake) — operational observability the agent READS; not a user-invokable capability' }, { prefix: 'ci', reason: 'operator-only CI status surface' }, { prefix: 'session', reason: 'single-session context surfaced via topicMemory endpoints' }, { prefix: 'identity', reason: 'identity files surfaced via the top-level `identity` field of the response' }, diff --git a/src/server/routes.ts b/src/server/routes.ts index 776fa494e..bf45bf00e 100644 --- a/src/server/routes.ts +++ b/src/server/routes.ts @@ -718,6 +718,10 @@ export interface RouteContext { /** AgentWorktreeReaper — reclaims stale CLI worktrees. Null when not wired. * Powers GET /worktrees/agent-reaper observability. */ agentWorktreeReaper?: import('../monitoring/AgentWorktreeReaper.js').AgentWorktreeReaper | null; + /** SleepController — agent hard-sleep decision (Stage B). Powers GET /sleep. */ + sleepController?: import('../monitoring/SleepController.js').SleepController | null; + /** AgentActivityState — shared idle signal; bumped at the inbound chokepoint. */ + agentActivityState?: import('../monitoring/AgentActivityState.js').AgentActivityState | null; /** SleepWakeDetector — timer-drift sleep detection with a CPU-starvation guard. * Powers GET /monitoring/sleep-wake (wake + suppression telemetry). Null when * not wired (older boot paths / standby) → the route 503s. */ @@ -3918,6 +3922,19 @@ export function createRoutes(ctx: RouteContext): Router { res.json(ctx.agentWorktreeReaper.snapshot()); }); + // SleepController (RESPONSIBLE-RESOURCE-USAGE — agent hard-sleep, Stage B). The + // pull-surface answer to "would this idle agent sleep right now, and if not, + // which guard is holding it awake?": the live verdict (awake / idle-shallow / + // keep-awake / would-sleep) + reason + thresholds + whether sleep is armed + // (enabled, dryRun). Read-only, Bearer-auth. Dry-run by default — never acts. + router.get('/sleep', (_req, res) => { + if (!ctx.sleepController) { + res.status(503).json({ error: 'sleep controller unavailable' }); + return; + } + res.json(ctx.sleepController.snapshot()); + }); + // Reap-log (UNIFIED-SESSION-LIFECYCLE §P4). The pull-surface answer to "why did // my session vanish?": every reap + every refused/skipped terminate, newest // last. Read-only, Bearer-auth (the router-level middleware). `?limit=N` @@ -9291,6 +9308,11 @@ export function createRoutes(ctx: RouteContext): Router { return; } + // Agent hard-sleep idle signal: a real inbound message is genuine activity + // (handshake-only pings without text are not). Bumps lastInbound so the + // SleepController never sleeps an agent that's actively being messaged. + if (text) ctx.agentActivityState?.markInbound(Date.now()); + // Version-handshake (only when lifelineVersion field is present AND // auth is configured — dev-mode with empty authToken skips the handshake // to avoid unauth'd fingerprinting channel if bearer-auth ever regresses). diff --git a/tests/integration/sleep-controller-routes.test.ts b/tests/integration/sleep-controller-routes.test.ts new file mode 100644 index 000000000..db660c06b --- /dev/null +++ b/tests/integration/sleep-controller-routes.test.ts @@ -0,0 +1,90 @@ +/** + * GET /sleep through the real createRoutes pipeline. + * - 503 when the SleepController is not wired. + * - 200 with the live verdict (decision + reason + thresholds) when present. + * This is the "feature is alive" check: the route returns 200, not 503. + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import express from 'express'; +import request from 'supertest'; +import fs from 'node:fs'; +import path from 'node:path'; +import os from 'node:os'; +import { createRoutes, type RouteContext } from '../../src/server/routes.js'; +import { SleepController, type SleepInput } from '../../src/monitoring/SleepController.js'; +import { SafeFsExecutor } from '../../src/core/SafeFsExecutor.js'; + +function ctxWith(stateDir: string, sleepController: SleepController | null): RouteContext { + return { + config: { projectName: 'test', projectDir: path.dirname(stateDir), stateDir, port: 0, sessions: {} as any, scheduler: {} as any } as any, + sessionManager: { listRunningSessions: () => [] } as any, + state: { getJobState: () => null, getSession: () => null, listSessions: () => [] } as any, + tokenLedger: null, + sleepController, + startTime: new Date(), + } as unknown as RouteContext; +} + +const deepIdle: SleepInput = { + now: 1_000_000_000_000, + runningSessions: 0, + lastInboundAt: 1_000_000_000_000 - 30 * 60_000, + lastActivityAt: 1_000_000_000_000 - 30 * 60_000, + holdsLease: false, + leaseActive: false, + inflightWork: false, + nextScheduledJobAt: null, +}; + +describe('GET /sleep (integration)', () => { + let tmpDir: string; + let stateDir: string; + + beforeEach(() => { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sleep-routes-')); + stateDir = path.join(tmpDir, '.instar'); + fs.mkdirSync(stateDir, { recursive: true }); + }); + afterEach(() => { + SafeFsExecutor.safeRmSync(tmpDir, { recursive: true, force: true, operation: 'tests/integration/sleep-controller-routes.test.ts' }); + }); + + function appWith(sleepController: SleepController | null): express.Express { + const app = express(); + app.use(express.json()); + app.use('/', createRoutes(ctxWith(stateDir, sleepController))); + return app; + } + + it('returns 503 when the SleepController is not wired', async () => { + const res = await request(appWith(null)).get('/sleep'); + expect(res.status).toBe(503); + expect(res.body.error).toMatch(/unavailable/); + }); + + it('returns 200 with the live verdict + thresholds when present (feature is alive)', async () => { + const controller = new SleepController( + { sample: () => deepIdle }, + { enabled: true, dryRun: true }, + ); + const res = await request(appWith(controller)).get('/sleep'); + expect(res.status).toBe(200); + expect(res.body.enabled).toBe(true); + expect(res.body.dryRun).toBe(true); + expect(res.body.verdict.decision).toBe('would-sleep'); + expect(res.body.thresholds.deepIdleMs).toBe(900_000); + expect(res.body.sleepRequested).toBe(false); // dry-run never arms + }); + + it('surfaces the blocking guard reason when a guard holds the agent awake', async () => { + const controller = new SleepController( + { sample: () => ({ ...deepIdle, leaseActive: true, holdsLease: true }) }, + { enabled: true, dryRun: true }, + ); + const res = await request(appWith(controller)).get('/sleep'); + expect(res.status).toBe(200); + expect(res.body.verdict.decision).toBe('keep-awake'); + expect(res.body.verdict.reason).toMatch(/lease/i); + }); +}); diff --git a/tests/unit/SleepController.test.ts b/tests/unit/SleepController.test.ts new file mode 100644 index 000000000..c3994b28d --- /dev/null +++ b/tests/unit/SleepController.test.ts @@ -0,0 +1,191 @@ +/** + * Unit tests — SleepController decision logic (agent hard-sleep, Stage B foundation). + * Covers BOTH sides of every guard boundary, and the dry-run-never-acts contract. + */ +import { describe, it, expect, vi } from 'vitest'; +import { + evaluateSleep, + SleepController, + DEFAULT_SLEEP_THRESHOLDS, + type SleepInput, + type SleepThresholds, + type SleepVerdict, +} from '../../src/monitoring/SleepController.js'; + +const T: SleepThresholds = { idleGraceMs: 120_000, deepIdleMs: 900_000, wakeLeadMs: 120_000 }; +const NOW = 1_000_000_000_000; + +/** Deep-idle, all guards clear ⇒ would-sleep by default; tests flip one field. */ +function input(over: Partial = {}): SleepInput { + return { + now: NOW, + runningSessions: 0, + lastInboundAt: NOW - 30 * 60_000, // 30 min ago (deep) + lastActivityAt: NOW - 30 * 60_000, + holdsLease: false, + leaseActive: false, + inflightWork: false, + nextScheduledJobAt: null, + ...over, + }; +} + +describe('evaluateSleep', () => { + it('would-sleep when deep-idle and every guard is clear', () => { + expect(evaluateSleep(input(), T).decision).toBe('would-sleep'); + }); + + it('awake when a session is running (even if otherwise deep-idle)', () => { + expect(evaluateSleep(input({ runningSessions: 1 }), T).decision).toBe('awake'); + }); + + it('awake when activity is within the idle grace window', () => { + expect(evaluateSleep(input({ lastActivityAt: NOW - 30_000 }), T).decision).toBe('awake'); + }); + + it('idle-shallow when idle past grace but before deep-idle', () => { + // 5 min idle: > 2 min grace, < 15 min deep + const v = evaluateSleep(input({ lastInboundAt: NOW - 5 * 60_000, lastActivityAt: NOW - 5 * 60_000 }), T); + expect(v.decision).toBe('idle-shallow'); + }); + + it('boundary: exactly deepIdleMs idle ⇒ deep (would-sleep), one ms less ⇒ shallow', () => { + const at = (ms: number) => input({ lastInboundAt: NOW - ms, lastActivityAt: NOW - ms }); + expect(evaluateSleep(at(T.deepIdleMs), T).decision).toBe('would-sleep'); + expect(evaluateSleep(at(T.deepIdleMs - 1), T).decision).toBe('idle-shallow'); + }); + + it('boundary: exactly idleGraceMs ⇒ idle (shallow), one ms less ⇒ awake', () => { + const at = (ms: number) => input({ lastInboundAt: NOW - ms, lastActivityAt: NOW - ms }); + expect(evaluateSleep(at(T.idleGraceMs), T).decision).toBe('idle-shallow'); + expect(evaluateSleep(at(T.idleGraceMs - 1), T).decision).toBe('awake'); + }); + + it('keep-awake when this machine holds the multi-machine lease', () => { + const v = evaluateSleep(input({ leaseActive: true, holdsLease: true }), T); + expect(v.decision).toBe('keep-awake'); + expect(v.reason).toMatch(/lease/i); + }); + + it('would-sleep when lease coordination active but this machine does NOT hold it', () => { + expect(evaluateSleep(input({ leaseActive: true, holdsLease: false }), T).decision).toBe('would-sleep'); + }); + + it('holdsLease is ignored when lease coordination is not active (single machine)', () => { + expect(evaluateSleep(input({ leaseActive: false, holdsLease: true }), T).decision).toBe('would-sleep'); + }); + + it('keep-awake when there is in-flight work', () => { + const v = evaluateSleep(input({ inflightWork: true }), T); + expect(v.decision).toBe('keep-awake'); + expect(v.reason).toMatch(/in-flight/i); + }); + + it('keep-awake when a scheduled job fires within the wake-lead window', () => { + const v = evaluateSleep(input({ nextScheduledJobAt: NOW + 60_000 }), T); // 1 min < 2 min lead + expect(v.decision).toBe('keep-awake'); + expect(v.reason).toMatch(/scheduled job/i); + }); + + it('would-sleep when the next scheduled job is comfortably beyond the wake-lead', () => { + expect(evaluateSleep(input({ nextScheduledJobAt: NOW + 60 * 60_000 }), T).decision).toBe('would-sleep'); + }); + + it('boundary: job exactly at wakeLead ⇒ keep-awake, one ms beyond ⇒ would-sleep', () => { + expect(evaluateSleep(input({ nextScheduledJobAt: NOW + T.wakeLeadMs }), T).decision).toBe('keep-awake'); + expect(evaluateSleep(input({ nextScheduledJobAt: NOW + T.wakeLeadMs + 1 }), T).decision).toBe('would-sleep'); + }); + + it('never any inbound/activity signal ⇒ treated as deep-idle (would-sleep)', () => { + expect(evaluateSleep(input({ lastInboundAt: null, lastActivityAt: null }), T).decision).toBe('would-sleep'); + }); + + it('uses the MOST RECENT of inbound vs activity for idle duration', () => { + // inbound long ago but activity recent ⇒ awake + const v = evaluateSleep(input({ lastInboundAt: NOW - 60 * 60_000, lastActivityAt: NOW - 10_000 }), T); + expect(v.decision).toBe('awake'); + }); +}); + +describe('SleepController', () => { + it('dry-run NEVER calls requestSleep even on would-sleep', () => { + const requestSleep = vi.fn(); + const c = new SleepController( + { sample: () => input(), requestSleep }, + { enabled: true, dryRun: true }, + ); + const v = c.tick(); + expect(v.decision).toBe('would-sleep'); + expect(requestSleep).not.toHaveBeenCalled(); + expect(c.state.sleepRequested).toBe(false); + }); + + it('live mode requests sleep ONCE per would-sleep episode', () => { + const requestSleep = vi.fn(); + let sample = input(); + const c = new SleepController( + { sample: () => sample, requestSleep }, + { enabled: true, dryRun: false }, + ); + c.tick(); // would-sleep → request + c.tick(); // still would-sleep → no second request (latched) + expect(requestSleep).toHaveBeenCalledTimes(1); + // leave would-sleep, then return ⇒ a fresh request + sample = input({ runningSessions: 1 }); + c.tick(); // awake → latch reset + sample = input(); + c.tick(); // would-sleep again → second request + expect(requestSleep).toHaveBeenCalledTimes(2); + }); + + it('live mode but disabled ⇒ does not request sleep', () => { + const requestSleep = vi.fn(); + const c = new SleepController( + { sample: () => input(), requestSleep }, + { enabled: false, dryRun: false }, + ); + c.tick(); + expect(requestSleep).not.toHaveBeenCalled(); + }); + + it('audits only on decision TRANSITIONS, not every tick', () => { + const audit = vi.fn(); + let sample = input(); + const c = new SleepController({ sample: () => sample, audit }, { enabled: true, dryRun: true }); + c.tick(); // would-sleep (transition from null) + c.tick(); // would-sleep (no change) + c.tick(); // would-sleep (no change) + expect(audit).toHaveBeenCalledTimes(1); + sample = input({ runningSessions: 1 }); + c.tick(); // awake (transition) + expect(audit).toHaveBeenCalledTimes(2); + expect(audit.mock.calls[1][0].decision).toBe('awake'); + expect(audit.mock.calls[1][0].dryRun).toBe(true); + }); + + it('default thresholds are applied when none provided', () => { + const c = new SleepController({ sample: () => input() }, { enabled: true, dryRun: true }); + // 30-min idle in input() exceeds default 15-min deepIdle ⇒ would-sleep + expect(c.tick().decision).toBe('would-sleep'); + expect(DEFAULT_SLEEP_THRESHOLDS.deepIdleMs).toBe(900_000); + }); +}); + +import { AgentActivityState } from '../../src/monitoring/AgentActivityState.js'; + +describe('AgentActivityState', () => { + it('starts with null signals', () => { + expect(new AgentActivityState().snapshot()).toEqual({ lastInboundAt: null, lastActivityAt: null }); + }); + it('markInbound sets both inbound and activity', () => { + const a = new AgentActivityState(); + a.markInbound(NOW); + expect(a.snapshot()).toEqual({ lastInboundAt: NOW, lastActivityAt: NOW }); + }); + it('markActivity advances activity but NOT inbound', () => { + const a = new AgentActivityState(); + a.markInbound(NOW - 1000); + a.markActivity(NOW); + expect(a.snapshot()).toEqual({ lastInboundAt: NOW - 1000, lastActivityAt: NOW }); + }); +}); diff --git a/upgrades/NEXT.md b/upgrades/NEXT.md new file mode 100644 index 000000000..8771f18f8 --- /dev/null +++ b/upgrades/NEXT.md @@ -0,0 +1,52 @@ +# Upgrade Guide — vNEXT + + + +## What Changed + +**Foundation for agent hard-sleep: the SleepController decision layer (dark).** The +deepest lever of the Responsible Resource Usage work is letting a deeply-idle agent +drop its server to near-zero footprint and wake instantly on the next message. That +mechanism is risky, so this change ships the SAFE half first: the part that decides +"is it actually safe for this idle agent to sleep right now?" — and nothing else. + +The new SleepController returns one of four verdicts — awake, idle-shallow, +keep-awake, or would-sleep — and applies every safety guard before it will ever say +would-sleep: it refuses if this machine currently holds the multi-machine serving +lease, if there is work in flight, or if a scheduled job is about to fire. Each +guard names itself in the reason. It ships OFF by default and, even when enabled, +runs in dry-run: it only records its decision to a log and serves it at a status +endpoint. It has no power to stop a server — that mechanism is a separate slice, +built only once this decision layer has been watched behaving correctly on a real +idle agent. + +## What to Tell Your User + +Nothing to configure, and nothing changes in how your agent behaves. This is the +groundwork for a future ability where a completely idle agent can quiet down to save +your machine's resources and wake the instant you message it. For now it only +watches and decides — it never actually sleeps anything — so it is safe and +invisible. You can see what it would decide at the sleep status endpoint. + +## Summary of New Capabilities + +- New SleepController decides whether a deeply-idle agent may hard-sleep, with + safety guards for held multi-machine lease, in-flight work, and imminent + scheduled jobs. Pure, exhaustively unit-tested on both sides of every boundary. +- New shared AgentActivityState idle signal, bumped at the inbound-message + chokepoint so a genuinely-messaged agent never sleeps. +- GET /sleep exposes the live verdict, reason, thresholds, and whether sleep is + armed. Read-only, Bearer-auth, 503-stub when disabled. +- Decision transitions audited to logs/agent-sleep-events.jsonl (low-noise). +- Config monitoring.agentSleep — OFF + dry-run by default. + +## Evidence + +- `tests/unit/SleepController.test.ts` — both sides of every guard boundary + (grace, deep-idle, lease, in-flight, scheduled-job), exact-threshold boundaries, + most-recent-of-inbound-vs-activity, dry-run-never-acts, once-per-episode latching, + transition-only audit, plus AgentActivityState. +- `tests/integration/sleep-controller-routes.test.ts` — GET /sleep returns 503 + unwired and 200 with the live verdict + thresholds when wired (feature is alive), + and surfaces the blocking guard reason. +- Side-effects: `upgrades/side-effects/agent-hard-sleep-controller.md`. diff --git a/upgrades/side-effects/agent-hard-sleep-controller.md b/upgrades/side-effects/agent-hard-sleep-controller.md new file mode 100644 index 000000000..776caad03 --- /dev/null +++ b/upgrades/side-effects/agent-hard-sleep-controller.md @@ -0,0 +1,101 @@ +# Side-Effects Review — Agent hard-sleep SleepController (Stage B, slice 1) + +**Version / slug:** `agent-hard-sleep-controller` +**Date:** `2026-05-31` +**Author:** `echo` +**Second-pass reviewer:** `not required — dark + dry-run + signal-only; the module holds no blocking authority and never stops a process` + +## Summary of the change + +Adds the DECISION half of agent hard-sleep: `SleepController` (pure `evaluateSleep` ++ a thin ticking class) decides whether a deeply-idle agent may drop to near-zero +footprint, applying every safety guard (held multi-machine lease / in-flight work / +imminent scheduled job). A shared `AgentActivityState` idle signal is bumped at the +inbound chokepoint (`/internal/telegram-forward`). Wired into the server (off + +dry-run by default), exposed read-only at `GET /sleep`, audited to +`logs/agent-sleep-events.jsonl` on decision transitions. Files: new +`src/monitoring/SleepController.ts`, `src/monitoring/AgentActivityState.ts`; +config `monitoring.agentSleep` (types.ts + ConfigDefaults.ts); wiring in +server.ts + AgentServer.ts + routes.ts; CapabilityIndex classification. + +## Decision-point inventory + +- `evaluateSleep` (new decision) — add — the awake/idle-shallow/keep-awake/would-sleep verdict with guards. +- `SleepController` audit + (live-only) sleep-request — add — transition-only audit; `requestSleep` is unwired in this slice (no consumer). +- `GET /sleep` route — add — read-only verdict surface (503 when unwired). +- Inbound chokepoint — pass-through — adds a non-blocking `markInbound()` side-call. + +--- + +## 1. Over-block + +No block/allow surface over any message or user action. The verdict is advisory; +in this slice nothing consumes a would-sleep (the `requestSleep` consumer is the +next slice). Over-block not applicable. + +## 2. Under-block + +The decision could, in principle, say "would-sleep" when it shouldn't (e.g. the +in-flight signal is approximate in this slice — it reads `currentInboundByTopic` +but not yet the relay/forward queue or the scheduler's next-fire). This is harmless +here: dry-run never acts on the verdict, and the next slice wires the remaining +in-flight + scheduler-wake signals BEFORE the mechanism that actually sleeps. The +guards that ARE wired (sessions, lease, recent activity) are exact. + +## 3. Level-of-abstraction fit + +Correct. The decision is a pure, exhaustively-tested function; the controller is a +thin cadence wrapper that mirrors the existing dark monitors (SessionReaper, +AgentWorktreeReaper) — same dark + dry-run discipline, same audit-on-transition, +same `snapshot()` route shape. It deliberately does NOT contain the mechanism +(supervisor stop / lifeline respawn); that is a separate slice so the decision can +be validated first. + +## 4. Signal vs authority compliance + +**Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md) + +- [x] No — this change produces a SIGNAL (a verdict + audit) with no blocking + authority over any message or process. In live mode (the mechanism slice) the verdict + would feed the supervisor handshake, but even then "sleep" is gated like the + SessionReaper (positive proof + KEEP-on-ambiguity), not a brittle detector with + kill authority. + +## 5. Interactions + +- **Shadowing:** none. `GET /sleep` is a new route; the inbound `markInbound()` is a + fire-and-forget side-call after the boot guard and does not alter the + forward's control flow or response. +- **Double-fire:** none. The controller is the only consumer of `AgentActivityState`. + No other monitor sleeps a server. +- **Races:** the controller ticks on a single timer (unref'd); `AgentActivityState` + is plain in-memory single-writer-per-event. The `sleepRequested` latch prevents + repeated requests within a would-sleep episode. +- **Feedback loops:** none — dry-run writes only an audit log it never reads back. + +## 6. External surfaces + +- **Other agents / install base:** pure additive source + a default-off config + block (auto-applied via ConfigDefaults; code reads with `?? default` so an agent + whose config lacks the block behaves identically — OFF). No agent-installed-file + change requiring a CLAUDE.md/hook migration; the route is internal observability + (classified in CapabilityIndex like /worktrees/agent-reaper), so no template/ + awareness section is required for this dark slice. +- **External systems:** none. +- **Persistent state:** one best-effort append-only audit log + (`logs/agent-sleep-events.jsonl`), written only on decision transitions. +- **Timing:** one unref'd 60s timer when enabled; never started when disabled. + +## 7. Rollback cost + +Pure additive code + a default-off config block. Revert the commit → the controller, +route, config, and audit disappear; nothing else changes. No migration, no agent +state repair, no user-visible behavior (it never acted). + +## Conclusion + +This review confirmed the slice is observability-only: a tested decision function +with all safety guards, shipped dark + dry-run, with no authority and no mechanism. +It is the safe foundation the next (mechanism) slice will build on, and the dry-run +audit is exactly what makes that next slice safe to wire. Clear to ship; validate by +watching `GET /sleep` + the audit log on a real idle agent before enabling.