fix: stop Windows Max-plan drain loop by thedotmack · Pull Request #2083 · thedotmack/claude-mem

thedotmack · 2026-04-21T03:02:45Z

Summary

Discord report: a Windows 11 user's Max-plan usage was being consumed in the background, with "continuous failed hooks" and a loop that only stopped after uninstalling claude-mem.
Investigation traced it to a regression chain introduced with the v12.3.3 RestartGuard. Full write-up in INVESTIGATION-windows-max-plan-drain.md; phased implementation plan in PLAN-windows-max-plan-drain-fix.md.
Three targeted fixes, each on one root-cause leg:
1. SessionRoutes restart-trip (SessionRoutes.ts:318-340) — when the guard trips, the code path previously left pending messages in 'pending' state, and the next daemon startup's processPendingQueues(50) re-replayed them. Now it calls markAllSessionMessagesAbandoned + removeSessionImmediate + broadcastSessionCompleted inline (mirroring the private WorkerService.terminateSession, pattern copied from the sibling SDK-terminated branch at SessionRoutes.ts:123).
2. RestartGuard — recordSuccess() no longer clears the window on a single success; it requires 5 consecutive successes before the decay path becomes eligible. Adds a terminal lifetime cap of 50 total restarts that never resets. Public API unchanged (recordRestart(): boolean, recordSuccess(): void, existing getters preserved; added totalRestarts, lifetimeCap).
3. unrecoverablePatterns — extracted into src/services/worker/unrecoverable-patterns.ts (testable in isolation); added 'OAuth token expired', 'token has been revoked', 'Unauthorized', 'OpenRouter API error: 401', 'OpenRouter API error: 403'. Deliberately not adding bare '401' — too broad.

Why the loop drained Max-plan specifically

The Claude Agent SDK subprocess inherits CLAUDE_CODE_OAUTH_TOKEN from the parent Claude Code session (EnvManager.ts:255). Every worker-driven Claude call is therefore billed against the user's subscription. Combined with the 91.6% MCP-loopback failure regime reported in memory (obs 71051), the old guard never tripped, processPendingQueues kept re-feeding the loop, and nothing in the unrecoverable list recognized OAuth expiry. The fix closes each gap.

Test plan

bun test tests/worker/agents/fallback-error-handler.test.ts — must stay green (adjacent logic, 25 tests passing locally after change).
Unit tests for RestartGuard (window cap, consecutive-success requirement, lifetime-cap terminality).
Unit tests for isUnrecoverableError against each new pattern; negative case for bare '401' in an unrelated string.
Manual reproduction on Windows: induce SDK failure, confirm (a) pending messages are abandoned after guard trip, (b) daemon restart does not replay them, (c) expired OAuth token aborts cleanly.
Confirm bun run build succeeds (pre-existing TS errors on unrelated lines are unaffected).

🤖 Generated with Claude Code

…guard trip, tighten RestartGuard, catch OAuth expiry Discord report: Windows 11 user's Max-plan usage was being consumed in the background. Root cause is a regression introduced alongside the v12.3.3 RestartGuard: (1) when the guard trips in the HTTP-layer crash-recovery path, pending messages were left in 'pending' state and auto-replayed by processPendingQueues on the next daemon start; (2) any single successful message cleared the restart-window decay, so a 9-of-10 failure regime never tripped the guard; (3) OAuth-token expiry errors are not in the unrecoverable list and loop forever on Max-plan subscriptions. SessionRoutes.ts: on restart-guard trip, abandon pending messages and remove the session (mirrors WorkerService.terminateSession which is private). Pattern copied from the sibling block at SessionRoutes.ts:123. RestartGuard.ts: require 5 consecutive recordSuccess() calls before the window-decay path becomes eligible; add a terminal lifetime cap of 50 total restarts that never resets. Public API unchanged. worker-service.ts: extract unrecoverablePatterns into a testable helper module and add OAuth/OpenRouter-auth patterns: 'OAuth token expired', 'token has been revoked', 'Unauthorized', 'OpenRouter API error: 401', 'OpenRouter API error: 403'. Bare '401' deliberately not added (too broad). Investigation and phased plan docs included for reviewers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-21T03:03:00Z

Summary by CodeRabbit

Bug Fixes
- Resolved a Windows 11 regression introduced in v12.3.3 that caused indefinite crash loops and recovery cycles
- Improved restart guard logic with lifetime caps to prevent excessive restart attempts
- Enhanced error classification to properly identify unrecoverable errors
- Improved session cleanup and pending message management during crash recovery scenarios

Walkthrough

Implements fixes for a Windows regression by extracting unrecoverable error patterns into a dedicated module, enhancing RestartGuard with lifetime caps and improved decay behavior, and ensuring pending messages are abandoned when restart guard trips during crash recovery.

Changes

Cohort / File(s)	Summary
Documentation `INVESTIGATION-windows-max-plan-drain.md`, `PLAN-windows-max-plan-drain-fix.md`	Added investigation report detailing Windows v12.3.3+ regression root causes (auth/token handling, pending session replay, restart decay) and comprehensive implementation plan with phased fixes.
Unrecoverable Error Patterns `src/services/worker/unrecoverable-patterns.ts`, `src/services/worker-service.ts`	Extracted inline pattern matching into new reusable module `isUnrecoverableError()` with expanded OAuth/auth-expiry patterns; refactored worker-service to import and use the centralized matcher.
RestartGuard Enhancements `src/services/worker/RestartGuard.ts`	Added lifetime restart cap (`ABSOLUTE_LIFETIME_RESTART_CAP`), consecutive-success decay eligibility tracking, and total-restarts observability getters; decay now requires both consecutive successes and elapsed time since last success.
Session Crash Recovery `src/services/worker/http/routes/SessionRoutes.ts`	Introduced `handleRestartGuardTripped()` helper to abandon all pending messages via store, remove session, broadcast completion, and log terminal state; extended restart-guard metadata logging with `totalRestarts` and `lifetimeCap`.

Sequence Diagram

sequenceDiagram
    participant Session as Session<br/>(Generator)
    participant RestartGuard as RestartGuard
    participant CrashRecovery as Crash Recovery<br/>Handler
    participant PendingStore as Pending Message<br/>Store
    participant SessionMgmt as Session<br/>Management

    Note over Session,SessionMgmt: Crash occurs during processing

    Session->>RestartGuard: recordRestart()
    RestartGuard->>RestartGuard: Increment totalRestarts<br/>Check lifetime cap
    alt Lifetime Cap Exceeded
        RestartGuard-->>Session: false (block restart)
        Session->>CrashRecovery: Crash recovery triggered
    else Within Cap
        RestartGuard-->>Session: true (allow restart)
    end

    CrashRecovery->>PendingStore: Mark all pending<br/>messages as abandoned
    PendingStore-->>CrashRecovery: Confirmed
    CrashRecovery->>SessionMgmt: Remove session<br/>from memory
    SessionMgmt-->>CrashRecovery: Removed
    CrashRecovery->>SessionMgmt: Broadcast session<br/>completion
    Note over PendingStore: Prevents respawn/replay<br/>of stale messages

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Generator restart guard strands pending messages with no recovery #2053: Directly addresses the Windows max-plan drain regression; the RestartGuard lifetime cap and SessionRoutes pending message abandonment implement the proposed fixes.

Possibly related PRs

fix: session lifecycle guards to prevent runaway API spend (#1590) #1693: Modifies SessionRoutes to abort sessions and mark pending messages as abandoned when lifecycle events trip (same crash recovery pattern).
feat: drain orphaned pending messages on SIGTERM session completion #1567: Abandons pending session messages proactively to prevent orphaned work during session termination (aligned design pattern).
fix: patch 7 critical bugs for v10.6.3 #1518: Touches worker unrecoverable-error pattern handling and spawn-related logic (overlapping error classification scope).

Poem

A Windows hop through guarded gates,
Where pending threads won't seal their fates—
Decay and caps now keep things tight,
No infinite drain in the dead of night! 🐰

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the primary fix: stopping a Windows Max-plan drain loop caused by a restart/replay regression.
Description check	✅ Passed	The description is detailed and directly related to the changeset, explaining the Discord report, investigation findings, three targeted fixes, root causes, and a comprehensive test plan.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch investigate/windows-infinite-loop-usage-drain

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-21T03:04:25Z

Code Review: fix: stop Windows Max-plan drain loop

Overview

This PR correctly identifies and fixes a real-money regression: a Windows user's Max-plan subscription was drained in the background by a crash loop that the RestartGuard couldn't stop. The root-cause analysis in INVESTIGATION-windows-max-plan-drain.md is thorough and the three-legged fix is well-scoped.

What's done well

SessionRoutes fix (Phase 1) is exactly right. The old code explicitly acknowledged leaving messages in 'pending' state in the log action string — this PR closes that gap cleanly by mirroring the WorkerService.terminateSession() three-step pattern inline.
handleRestartGuardTripped extraction makes the formerly-inlined guard-trip logic unit-testable and removes the pre-existing split-brain between the two crash-recovery paths.
Extracting unrecoverable-patterns.ts is a good isolation move — the matcher is now testable without importing the full WorkerService.
Consecutive-success requirement in RestartGuard directly addresses the 91.6% failure rate regime. A single fluky success resetting the window was the exact mechanism that let the loop run indefinitely.
Absolute lifetime cap is a prudent backstop. The investigation is right that the windowed guard alone is insufficient.

Issues

1. No tests shipped (blocking for me)

The Phase 4 test plan is detailed and the PR checklist is unchecked. Three test files were planned:

tests/worker/RestartGuard.test.ts
tests/worker/worker-service-unrecoverable.test.ts
tests/worker/http/SessionRoutes.restart-trip.test.ts

None appear in the diff. Given that the investigation notes "no existing tests for RestartGuard or unrecoverablePatterns", and this is a safety-critical guard against billing drain, shipping without coverage is risky. The RestartGuard logic is purely algorithmic and easy to unit-test with a mocked clock.

2. `'Unauthorized'` is too broad

// src/services/worker/unrecoverable-patterns.ts
'Unauthorized',

The PR justifiably avoids bare '401' (would match request IDs), but 'Unauthorized' is a common English word that appears in many non-auth error messages:

"User is not authorized to access this resource"
"Unauthorized access attempt blocked"
Any HTTP middleware or proxy that formats its own 403 message

The investigation references '401 Unauthorized' in fallback-error-handler.test.ts:75-77 as the terminal pattern — consider using '401 Unauthorized' (the standard HTTP status line) instead of bare 'Unauthorized'. That's still specific enough to catch OAuth errors while avoiding false positives on authorization-related application errors.

3. Markdown investigation/plan files in repo root

INVESTIGATION-windows-max-plan-drain.md and PLAN-windows-max-plan-drain-fix.md are committed to the repo root. These are internal working documents. The repo already has a private/context/ tree (referenced in CLAUDE.md for the exit-codes doc). These would be better placed in private/context/ or docs/ so the root stays clean for standard project files, or removed entirely if the investigation doc is the authoritative record.

4. Minor: optional-chaining guard in `handleRestartGuardTripped`

private handleRestartGuardTripped(
  sessionDbId: number,
  pendingCount: number,
  session: { abortController: AbortController; restartGuard?: RestartGuard }
): void {
  const restartGuard = session.restartGuard; // may be undefined
  ...
  restartsInWindow: restartGuard?.restartsInWindow,  // logs undefined silently

The parameter type marks restartGuard as optional, so the logger will silently emit undefined fields when called without one. In practice the only call site (SessionRoutes.ts:362) is guarded by if (!restartAllowed) which implies the guard was used — but the helper's signature doesn't enforce that. Either make restartGuard required in the parameter type (and add a null-check at the call site) or add an explicit log note when it's missing. Low-impact but worth tightening given this is diagnostic data for an active incident.

Summary

The fix logic itself is correct — all three legs of the drain loop are addressed and the approach matches the existing patterns in the codebase. The main blocker is the absence of the planned tests, particularly for RestartGuard (algorithmic, trivial to test with a fake clock) and the new isUnrecoverableError matcher. Would suggest holding merge until those land, given the blast radius of the bug being fixed.

🤖 Generated with Claude Code

greptile-apps · 2026-04-21T03:05:50Z

Greptile Summary

This PR closes a Windows Max-plan drain loop by addressing three root causes: (1) the restart guard's trip branch in SessionRoutes now marks pending messages as abandoned and removes the session, preventing processPendingQueues from replaying them on next daemon startup; (2) RestartGuard now requires 5 consecutive successes before decay is eligible and adds a hard 50-restart lifetime cap; (3) OAuth/OpenRouter auth-failure patterns are added to the unrecoverable-error list and extracted into a testable module.

The bare 'Unauthorized' substring in unrecoverable-patterns.ts is too broad — it could match error text from MCP tools or external HTTP APIs invoked during a session, permanently terminating a healthy session that could have recovered. The already-added 'OAuth token expired' and 'token has been revoked' patterns should cover the Max-plan scenario without requiring this catch-all.

Confidence Score: 4/5

Safe to merge after addressing the 'Unauthorized' pattern scope — all other changes are well-targeted and correct.

There is one P1 finding: the bare 'Unauthorized' substring in the unrecoverable-patterns list is too broad and could prematurely kill sessions due to transient 401 errors from MCP-called APIs. All other changes (SessionRoutes guard-trip cleanup, RestartGuard tightening, module extraction) are logically sound and directly address the described drain loop.

src/services/worker/unrecoverable-patterns.ts — the 'Unauthorized' pattern needs scoping before merge.

Important Files Changed

Filename	Overview
src/services/worker/unrecoverable-patterns.ts	New file extracting error patterns from worker-service.ts; adds OAuth/OpenRouter patterns. The bare 'Unauthorized' substring is too broad and risks killing sessions on transient 3rd-party 401 errors.
src/services/worker/RestartGuard.ts	Adds consecutive-success requirement (5) for decay eligibility and a 50-restart absolute lifetime cap. Logic is sound; minor off-by-one in documentation vs. implementation (50 allowed, 51st blocked).
src/services/worker/http/routes/SessionRoutes.ts	New handleRestartGuardTripped() correctly aborts the controller, marks messages abandoned, removes the session, and broadcasts completion — closing the pending-message replay loop on guard trip.
src/services/worker-service.ts	Refactored to import isUnrecoverableError from the new unrecoverable-patterns module; no behavioral change in this file's logic.
INVESTIGATION-windows-max-plan-drain.md	Documentation file; records root-cause analysis of the Windows Max-plan drain loop. No code changes.
PLAN-windows-max-plan-drain-fix.md	Documentation file; phased implementation plan for the drain-loop fix. No code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Generator exits with pending work] --> B{isUnrecoverableError?}
    B -- Yes --> C[terminateSession\nmark abandoned + remove]
    B -- No --> D[restartGuard.recordRestart]
    D --> E{Windowed cap\nor lifetime cap exceeded?}
    E -- No --> F[Restart generator\nwith backoff]
    F --> A
    E -- Yes --> G[handleRestartGuardTripped\nSessionRoutes path]
    G --> H[abortController.abort]
    H --> I[markAllSessionMessagesAbandoned]
    I --> J[removeSessionImmediate]
    J --> K[broadcastSessionCompleted]
    K --> L[STOP — messages not replayed\non next daemon startup]

_{Reviews (1): Last reviewed commit: "fix: stop Windows Max-plan drain loop — ..." | Re-trigger Greptile}

greptile-apps · 2026-04-21T03:05:57Z

+  // is no longer valid.
+  'OAuth token expired',
+  'token has been revoked',
+  'Unauthorized',


'Unauthorized' pattern too broad — may prematurely kill healthy sessions

'Unauthorized' is a plain-English word that can appear in error messages from MCP tools, external HTTP APIs called through MCP, database permission errors, or any third-party library. Because isUnrecoverableError is a substring match, a transient "Unauthorized" from an MCP-called REST endpoint would permanently terminate the session and abandon all pending messages — the same effect as an expired OAuth token.

The PR description explicitly rejected bare '401' as "too broad," but 'Unauthorized' carries the same risk. The OAuth-specific patterns already added ('OAuth token expired', 'token has been revoked') should be sufficient to catch Max-plan token expiry. If Claude SDK specifically emits an "Unauthorized" string for expired tokens, a more scoped prefix (e.g., 'claude: Unauthorized' or 'SDK error: Unauthorized') would be safer.

greptile-apps · 2026-04-21T03:05:58Z

+    if (this.totalRestartsAllTime > ABSOLUTE_LIFETIME_RESTART_CAP) {
+      return false;
+    }


Lifetime-cap check is off-by-one relative to the stated limit

totalRestartsAllTime is incremented before the > ABSOLUTE_LIFETIME_RESTART_CAP check, so restart #50 sets the counter to 50, 50 > 50 is false, and the restart is allowed. The cap therefore permits exactly 50 restarts (not "trips at 50"). This is consistent with the PR description ("50 total restarts"), but a brief comment clarifying that > CAP means "CAP restarts are allowed, CAP+1 is the first blocked one" would prevent future confusion.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/services/worker-service.ts (1)

816-827: ⚠️ Potential issue | 🟠 Major

Emit the session-complete event from terminateSession() too.

This branch now logs the richer guard state, but its terminal path still funnels through this.terminateSession(...), which only abandons messages and removes the session. runFallbackForTerminatedSession() and SessionRoutes.handleRestartGuardTripped() already call broadcastSessionCompleted(), so worker-service guard trips and unrecoverable-error shutdowns still miss that terminal event.

🔧 Proposed fix

  private terminateSession(sessionDbId: number, reason: string): void {
    const pendingStore = this.sessionManager.getPendingMessageStore();
    const abandoned = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);

    logger.info('SYSTEM', 'Session terminated', {
      sessionId: sessionDbId,
      reason,
      abandonedMessages: abandoned
    });

    // removeSessionImmediate fires onSessionDeletedCallback → broadcastProcessingStatus()
    this.sessionManager.removeSessionImmediate(sessionDbId);
+   this.sessionEventBroadcaster.broadcastSessionCompleted(sessionDbId);
  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/services/worker-service.ts` around lines 816 - 827, The restart-guard
branch ends by calling this.terminateSession(session.sessionDbId,
'max_restarts_exceeded') but terminateSession does not emit the session-complete
event, so consumers miss the terminal event; update terminateSession(sessionId,
reason) to also call broadcastSessionCompleted(sessionId, { reason }) (or the
equivalent session-complete emitter used by
runFallbackForTerminatedSession()/SessionRoutes.handleRestartGuardTripped) after
it finishes abandoning messages/removing the session, ensuring all terminal
paths (restart guard trips and unrecoverable shutdowns) emit the same
session-complete event.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@PLAN-windows-max-plan-drain-fix.md`:
- Around line 385-390: Update Phase 5 verification to look in the new module
instead of the old file: replace the grep that targets
src/services/worker-service.ts with a search against
src/services/worker/unrecoverable-patterns.ts (e.g., grep -n "'OAuth token
expired'" src/services/worker/unrecoverable-patterns.ts) so the
OAuth/unrecoverable matcher check points to the extracted symbol in
unrecoverable-patterns.ts.
- Around line 241-245: The documentation and test plan disagree about the
lifetime-cap boundary for restarts (one place says the 50th restart should be
blocked, another says 50 are allowed and the 51st blocks); decide on the single
canonical behavior (either "max 50 allowed, 51st blocked" or "50th blocked") and
update both the test expectation in RestartGuard.test.ts (referenced as the
Phase 4 test) and the descriptive lines in this doc (the two bullets describing
49/50/51 behavior and the later lines 326-327) so they match the chosen rule and
the term "lifetime-cap" is used consistently throughout.

In `@src/services/worker/RestartGuard.ts`:
- Around line 31-50: In recordRestart(), after you set
this.consecutiveSuccessCount = 0 (which breaks the success streak), also clear
the decay flag by setting this.decayEligible = false so a broken streak cannot
later trigger the decay path; update the recordRestart() function (referencing
recordRestart(), consecutiveSuccessCount, decayEligible, restartTimestamps,
lastSuccessfulProcessing) to flip decayEligible off immediately when a restart
breaks the streak.

---

Outside diff comments:
In `@src/services/worker-service.ts`:
- Around line 816-827: The restart-guard branch ends by calling
this.terminateSession(session.sessionDbId, 'max_restarts_exceeded') but
terminateSession does not emit the session-complete event, so consumers miss the
terminal event; update terminateSession(sessionId, reason) to also call
broadcastSessionCompleted(sessionId, { reason }) (or the equivalent
session-complete emitter used by
runFallbackForTerminatedSession()/SessionRoutes.handleRestartGuardTripped) after
it finishes abandoning messages/removing the session, ensuring all terminal
paths (restart guard trips and unrecoverable shutdowns) emit the same
session-complete event.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 135d28be-068a-4f92-82a3-e7a6b7bff82b

📥 Commits

Reviewing files that changed from the base of the PR and between 49ab404 and a172c6e.

📒 Files selected for processing (6)

INVESTIGATION-windows-max-plan-drain.md
PLAN-windows-max-plan-drain-fix.md
src/services/worker-service.ts
src/services/worker/RestartGuard.ts
src/services/worker/http/routes/SessionRoutes.ts
src/services/worker/unrecoverable-patterns.ts

coderabbitai · 2026-04-21T03:10:01Z

+- `bun test tests/worker/RestartGuard.test.ts` (new file — see Phase 4).
+- Behavior check: 49 failed restarts → still allowed; 50th → blocked; 51st →
+  blocked (lifetime cap persists).
+- Behavior check: 4 successes then restart → full window still counted.
+  5th success then restart 5 min later → window cleared.


⚠️ Potential issue | 🟡 Minor

Reconcile the lifetime-cap expectation.

This section says the 50th restart should already be blocked, but Lines 326-327 later say 50 restarts are still allowed and only the 51st blocks. Please pick one so the tests and the implementation don't drift apart.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@PLAN-windows-max-plan-drain-fix.md` around lines 241 - 245, The documentation and test plan disagree about the lifetime-cap boundary for restarts (one place says the 50th restart should be blocked, another says 50 are allowed and the 51st blocks); decide on the single canonical behavior (either "max 50 allowed, 51st blocked" or "50th blocked") and update both the test expectation in RestartGuard.test.ts (referenced as the Phase 4 test) and the descriptive lines in this doc (the two bullets describing 49/50/51 behavior and the later lines 326-327) so they match the chosen rule and the term "lifetime-cap" is used consistently throughout.

coderabbitai · 2026-04-21T03:10:01Z

+1. `bun run build` — builds cleanly (no TS errors).
+2. `bun test` — full suite green.
+3. `grep -rn 'Messages remain in pending state' src/` — no matches (the phrase
+   is gone from the codebase).
+4. `grep -n "'OAuth token expired'" src/services/worker-service.ts` — matches.
+5. `grep -n 'ABSOLUTE_LIFETIME_RESTART_CAP\|REQUIRED_CONSECUTIVE_SUCCESSES_FOR_DECAY' src/services/worker/RestartGuard.ts`


⚠️ Potential issue | 🟡 Minor

Phase 5 still points verification at the old file.

The OAuth/unrecoverable matcher was extracted to src/services/worker/unrecoverable-patterns.ts, so this grep will now fail and send follow-up edits to the wrong place.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@PLAN-windows-max-plan-drain-fix.md` around lines 385 - 390, Update Phase 5 verification to look in the new module instead of the old file: replace the grep that targets src/services/worker-service.ts with a search against src/services/worker/unrecoverable-patterns.ts (e.g., grep -n "'OAuth token expired'" src/services/worker/unrecoverable-patterns.ts) so the OAuth/unrecoverable matcher check points to the extracted symbol in unrecoverable-patterns.ts.

coderabbitai · 2026-04-21T03:10:01Z

  recordRestart(): boolean {
+    this.totalRestartsAllTime += 1;
+    this.consecutiveSuccessCount = 0; // streak broken by any restart
+
+    // Terminal: lifetime cap reached — never resets, even if successes follow.
+    if (this.totalRestartsAllTime > ABSOLUTE_LIFETIME_RESTART_CAP) {
+      return false;
+    }
+
    const now = Date.now();

-    // Decay: clear history only after real success + 5min of uninterrupted success
-    if (this.lastSuccessfulProcessing !== null
+    // Decay: only fires if we accumulated the required consecutive successes
+    // AND 5min has elapsed since the last success. One-off successes cannot
+    // clear the windowed-restart history.
+    if (this.decayEligible
+        && this.lastSuccessfulProcessing !== null
        && now - this.lastSuccessfulProcessing >= DECAY_AFTER_SUCCESS_MS) {
      this.restartTimestamps = [];
      this.lastSuccessfulProcessing = null;
+      this.decayEligible = false;


⚠️ Potential issue | 🟠 Major

Clear decayEligible when a restart breaks the success streak.

recordRestart() zeroes consecutiveSuccessCount, but it leaves decayEligible armed. After 5 successes, one restart, and then a 5-minute gap, the next restart will still wipe restartTimestamps even though the streak was already broken. That reopens the slow-drip path this change is trying to close.

🔧 Proposed fix

recordRestart(): boolean { this.totalRestartsAllTime += 1; this.consecutiveSuccessCount = 0; // streak broken by any restart + this.decayEligible = false; // Terminal: lifetime cap reached — never resets, even if successes follow. if (this.totalRestartsAllTime > ABSOLUTE_LIFETIME_RESTART_CAP) { return false; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/worker/RestartGuard.ts` around lines 31 - 50, In recordRestart(), after you set this.consecutiveSuccessCount = 0 (which breaks the success streak), also clear the decay flag by setting this.decayEligible = false so a broken streak cannot later trigger the decay path; update the recordRestart() function (referencing recordRestart(), consecutiveSuccessCount, decayEligible, restartTimestamps, lastSuccessfulProcessing) to flip decayEligible off immediately when a restart breaks the streak.

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 21, 2026

View reviewed changes

thedotmack closed this Apr 22, 2026

Dee-0503 mentioned this pull request Apr 24, 2026

chore: merge upstream thedotmack/claude-mem v12.3.9 Dee-0503/claude-mem#16

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop Windows Max-plan drain loop#2083

fix: stop Windows Max-plan drain loop#2083
thedotmack wants to merge 1 commit into
mainfrom
investigate/windows-infinite-loop-usage-drain

thedotmack commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot Apr 21, 2026

Uh oh!

greptile-apps Bot Apr 21, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thedotmack commented Apr 21, 2026

Summary

Why the loop drained Max-plan specifically

Test plan

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

claude Bot commented Apr 21, 2026

Code Review: fix: stop Windows Max-plan drain loop

Overview

What's done well

Issues

1. No tests shipped (blocking for me)

2. 'Unauthorized' is too broad

3. Markdown investigation/plan files in repo root

4. Minor: optional-chaining guard in handleRestartGuardTripped

Summary

Uh oh!

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

2. `'Unauthorized'` is too broad

4. Minor: optional-chaining guard in `handleRestartGuardTripped`