fix(client): self-healing for permanently stuck expired shape handles#4087
Open
KyleAMathews wants to merge 5 commits intomainfrom
Open
fix(client): self-healing for permanently stuck expired shape handles#4087KyleAMathews wants to merge 5 commits intomainfrom
KyleAMathews wants to merge 5 commits intomainfrom
Conversation
When stale cache retries exhaust (3 attempts), clear the expired entry from localStorage and retry once without the expired_handle param. Since handles are never reused (SPEC.md S0), the fresh response gets a new handle and bypasses stale detection. This prevents shapes from being permanently unloadable when a proxy strips cache-buster query params. Also documents the server handle uniqueness guarantee (S0) in the spec, updates the loop-back table for the new self-healing path, and resets the recovery guard on up-to-date so self-healing remains available for long-lived streams. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
commit: |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #4087 +/- ##
==========================================
- Coverage 84.90% 75.75% -9.16%
==========================================
Files 39 11 -28
Lines 2869 693 -2176
Branches 609 174 -435
==========================================
- Hits 2436 525 -1911
+ Misses 431 167 -264
+ Partials 2 1 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…test The test waited for fast-loop detection to error, but the exponential backoff (100ms-5s across 5 detections) takes longer than the timeout in CI. Simplified to verify self-healing fires and the entry is cleared — the fast-loop error path is already tested in stream.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The #expiredShapeRecoveryKey guard was only cleared in #onMessages when an up-to-date batch arrived. The 204 backward-compatibility path transitions directly to LiveState without going through #onMessages (empty body → batch.length === 0 → early return), leaving the guard stuck. This prevented a second self-healing cycle on the same stream instance. Clear the guard in #onInitialResponse when the response transitions directly to live (action=accepted, state=live), covering the 204 path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…test The caughtError===null assertion was environment-sensitive: the fast-loop detector's 500ms window can catch more requests on slower machines, firing a 502 that's orthogonal to the recovery guard bug being tested. The precise signal is selfHealCount===2: if the guard is stuck, the code throws 502 *before* incrementing, so selfHealCount stays at 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expired shape handle entries in localStorage can get permanently stuck, preventing data from ever loading for affected shapes. This adds a self-healing retry mechanism that clears the poisoned entry and retries once, allowing automatic recovery even when a proxy strips cache-buster query parameters.
Based on #4085 by @evan-liveflow — refined with additional hardening from code review.
Root Cause
When a shape gets a 409 (handle rotation), the client stores the old handle in
localStorage['electric_expired_shapes']. On future requests, if a response contains that handle, the client treats it as a stale cached response and retries up to 3 times with cache-buster params.The problem: if a proxy (e.g., phoenix_sync) strips query parameters, the cache busters are ineffective. All 3 retries fail,
FetchError(502)is thrown toonError, and ifonErrordoesn't retry, the stream dies. The expired entry persists in localStorage, so the next session hits the same wall — permanently.Since the server never reuses handles (now documented as SPEC.md S0), the expired entry becomes a false positive once the caching layer clears — but the client has no way to discover this.
Approach
After stale cache retries exhaust (3 attempts), the client now:
expired_handleparam. Since handles are never reused, the fresh response will have a new handle and won't trigger stale detection#expiredShapeRecoveryKey(once per shape key, reset on up-to-date)Key Invariants
#expiredShapeRecoveryKeyguard)Non-goals
onErrorcontract — the fix works regardless of what the user'sonErrorcallback doesVerification
Files changed
src/client.ts#onInitialResponse, recovery key cleared on up-to-date, updated catch block commenttest/expired-shapes-cache.test.tsSPEC.md.changeset/fix-expired-shapes-self-healing.mdBased on #4085