feat(agent-sleep): hard-sleep stop+wake mechanism (Stage B slice 2, dark + dry-run) by JKHeadley · Pull Request #603 · JKHeadley/instar

JKHeadley · 2026-05-31T04:40:50Z

What & why

Stage B's mechanism — the part that ACTS on the slice-1 SleepController decision. When sleep is enabled and an agent is deeply idle + safe, its heavy server is stopped to save the machine's resources and respawned the instant a message arrives. Ships dark + dry-run (monitoring.agentSleep, off by default; the live requestSleep flag is only written when enabled && !dryRun). Builds on merged #599.

The handshake (reuses the proven restart lifecycle)

Sleep: live SleepController writes state/sleep-requested.json → ServerSupervisor.checkSleepRequest() stops the server tmux session, sets slept, writes state/slept-marker.json.
Suppress auto-respawn: the health loop short-circuits at the top — if (this.slept) { checkWakeRequest(); return; } — so a slept server isn't treated as crashed. This is the only loop-flow change, and it's a no-op until a live sleep-request lands.
Wake: an inbound message → TelegramLifeline.requestWakeIfSlept() (at the top of processUpdate, ungated) writes state/wake-requested.json → checkWakeRequest() respawns via spawnServer(). The held message replays through the existing forward-retry queue once healthy (zero loss).
Brick defense: a slept-marker keeps a rebooted (or fleet-watchdog-bounced) supervisor asleep; wakeFromSleep() is the operator escape hatch wired into /lifeline restart + /reset.

⚠️ Adversarial second-pass review caught a critical brick — fixed

The review found the wake-trigger was originally inside forwardToServer(), which is gated on supervisor.healthy — but a slept server is not healthy, so an inbound message would queue and never write the wake flag → the server would never wake. Fixed: moved requestWakeIfSlept() to the top of processUpdate(), before any health gate. The review also found /lifeline restart didn't clear the slept-marker (manual recovery also bricked) — fixed via wakeFromSleep(). Both fixes are tested. The reviewer confirmed the dark code is inert by default.

Tests (unit + regression)

ServerSupervisor-sleep-wake.test.ts — sleep stops + marks + enters slept; no-request no-op; expired-request ignored; wake respawns + clears; wake-when-not-slept no-op; idempotent re-sleep; boot-marker signal; wakeFromSleep clears slept+marker (escape hatch).
agentSleepWake.test.ts — marker→wake-request; no-marker→no-op.
SleepController.test.ts — sleepRequestWriter writes the TTL-stamped flag.
Regression: ServerSupervisor-handshake / supervisor-health-check / supervisor-cpu-starvation stay green — existing crash-recovery is byte-identical when slept===false.

Safety / process

Dark + dry-run + additive. No new config/route/agent-installed-file. Revert = the handlers + the slept short-circuit disappear; supervisor/lifeline behave as before.
Spec docs/specs/agent-hard-sleep-mechanism.md (converged + approved: true) + ELI16. Side-effects upgrades/side-effects/agent-hard-sleep-mechanism.md.
⚠️ Self-approved under the delegated deploy mandate (Justin: build Stage B now, topic 16782). The enablement — not this dark ship — is the reviewed gate: turn it on first on a test agent with Justin watching. A few enablement-gated refinements (scheduler-wake, lifeline-serves-health-while-asleep) are tracked.

🤖 Generated with Claude Code

…ark + dry-run) Acts on the slice-1 SleepController verdict: in live mode it writes state/sleep-requested.json; the ServerSupervisor stops the server tmux session + enters 'slept' (the health loop short-circuits, suppressing auto-respawn) and only watches for state/wake-requested.json, which the lifeline writes on the next inbound message → supervisor respawns + the existing forward-retry queue replays the buffered message. A slept-marker keeps a rebooted/watchdog-bounced supervisor asleep. - ServerSupervisor: checkSleepRequest/checkWakeRequest, slept short-circuit (the only loop-flow change; no-op until a live sleep-request lands), boot-marker stay-asleep, wakeFromSleep() escape hatch. - SleepController.sleepRequestWriter (live-mode flag, TTL-stamped). - TelegramLifeline: requestWakeIfSlept() at the TOP of processUpdate (ungated) via the pure agentSleepWake helper; /lifeline restart + reset clear slept state. ⚠️ Adversarial 2nd-pass review CAUGHT A CRITICAL BRICK (wake-trigger was gated on supervisor.healthy → a slept server is unhealthy → inbound queued, never woke) + a broken manual escape hatch (/lifeline restart didn't clear the marker). BOTH fixed + tested before commit. Dark code verified inert. 50 tests incl. regression on the existing supervisor suites (green). Self-approved under the deploy mandate; ships dark, enablement is the reviewed gate (topic 16782). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-05-31T04:40:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
instar	Ready	Preview, Comment	May 31, 2026 4:40am

vercel Bot deployed to Preview May 31, 2026 04:40 View deployment

JKHeadley merged commit 588f9ab into main May 31, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent-sleep): hard-sleep stop+wake mechanism (Stage B slice 2, dark + dry-run)#603

feat(agent-sleep): hard-sleep stop+wake mechanism (Stage B slice 2, dark + dry-run)#603
JKHeadley merged 1 commit into
mainfrom
echo/agent-hard-sleep-mech

JKHeadley commented May 31, 2026

Uh oh!

vercel Bot commented May 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JKHeadley commented May 31, 2026

What & why

The handshake (reuses the proven restart lifecycle)

⚠️ Adversarial second-pass review caught a critical brick — fixed

Tests (unit + regression)

Safety / process

Uh oh!

vercel Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 31, 2026 •

edited

Loading