feat(agent-sleep): SleepController decision foundation (Stage B slice 1, dark + dry-run)#599
Merged
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
… 1, dark + dry-run) The deepest lever of the Responsible Resource Usage standard — letting a deeply idle agent drop its server to near-zero footprint and wake on the next message. The mechanism (supervisor stop + lifeline respawn) is risky, so this slice ships the SAFE half first: the decision + every safety guard, dark + dry-run, so the 'is it safe to sleep?' reasoning is proven + observable before anything stops. - SleepController: pure evaluateSleep (awake / idle-shallow / keep-awake / would-sleep) with guards — held multi-machine lease, in-flight work, imminent scheduled job. Thin ticking class audits transitions to logs/agent-sleep-events.jsonl; dry-run never acts. - AgentActivityState: shared idle signal, bumped at the inbound chokepoint (/internal/telegram-forward) so a genuinely-messaged agent never sleeps. - Wired into the server (off + dry-run), GET /sleep exposes the live verdict, config monitoring.agentSleep, CapabilityIndex classification. 3-tier tests: unit (both sides of every guard boundary, exact thresholds, dry-run-never-acts, transition-only audit) + integration (GET /sleep 503-unwired / 200-alive with verdict). Signal-only, no blocking authority, never stops a process. Self-approved under the deploy mandate (Justin: build Stage B now, topic 16782). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d5305da to
c897755
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
The deepest lever of the Responsible Resource Usage standard is agent hard-sleep: letting a deeply-idle agent drop its server to near-zero footprint and wake instantly on the next message (the dominant idle cost on a multi-agent box). The mechanism — supervisor stopping the server + lifeline respawning it without losing a message — is risky, so this slice ships the safe half first: the sleep decision and every safety guard, dark + dry-run, so the "is it safe to sleep?" reasoning is proven and observable before anything ever stops a server. Same dark + dry-run discipline as the SessionReaper / AgentWorktreeReaper.
What's in this slice
SleepController— pureevaluateSleep()returns one of four verdicts:awake,idle-shallow,keep-awake,would-sleep. Every safety guard blocks sleep and names itself in the reason: this machine holds the multi-machine serving lease; in-flight work; a scheduled job within the wake-lead window. A thin ticking class audits decision transitions tologs/agent-sleep-events.jsonl(low-noise) and, in live mode only, requests sleep once per episode. Dry-run never acts.AgentActivityState— the shared idle signal the umbrella design calls for, bumped at the inbound-message chokepoint (/internal/telegram-forward) so a genuinely-messaged agent never sleeps. (Health-check traffic deliberately does NOT count as activity.)GET /sleepexposes the live verdict + reason + thresholds + whether sleep is armed;monitoring.agentSleepconfig; CapabilityIndex classification.What is explicitly NOT in this slice
The mechanism: the supervisor consuming a sleep-request to stop the server, the lifeline wake-respawn + buffered-message replay, the watchdog treating a slept agent as healthy, and the remaining in-flight/scheduler-wake signal wiring. Those are the next slice — built only once this decision layer has been watched behaving correctly on a real idle agent (does it ever reach
would-sleep, was everykeep-awakecorrect?).Tests (3-tier)
SleepController.test.ts, 23): both sides of every guard boundary, exact-threshold boundaries (deepIdle/grace/wakeLead), most-recent-of-inbound-vs-activity, dry-run-never-acts, once-per-episode latching, transition-only audit, +AgentActivityState.sleep-controller-routes.test.ts, 3):GET /sleepreturns 503 unwired and 200 with the live verdict + thresholds when wired (feature is alive), and surfaces the blocking guard reason.Safety / process
docs/specs/agent-hard-sleep-controller.md(converged +approved: true) + ELI16 sibling. Side-effects:upgrades/side-effects/agent-hard-sleep-controller.md.docs/specs/agent-sleep-mode.md(docs(spec): Agent Sleep Mode design (Level 3 — draft for review) #594).🤖 Generated with Claude Code