Skip to content

Allow expired cache shapes to attempt self-healing once after maxRetries exceeded#4085

Closed
evanob wants to merge 1 commit intoelectric-sql:mainfrom
liveflow-io:stale-cache
Closed

Allow expired cache shapes to attempt self-healing once after maxRetries exceeded#4085
evanob wants to merge 1 commit intoelectric-sql:mainfrom
liveflow-io:stale-cache

Conversation

@evanob
Copy link
Copy Markdown
Contributor

@evanob evanob commented Apr 2, 2026

Opening as a draft PR rather than an issue because I want to share diff that worked for us. Claude suggested implementing a TTL on the expired shapes in local storage too, but I haven't tried that yet.

The problem we've seen is that we sometimes see warnings like the below. We don't have a CDN and the warnings persist even with Disable Cache in the network tab, so it's not browser cache either.

image

We could see this for one user in a single org, but not for the same user in a different org or another user in the same org. When it would happen it wouldn't be for all shapes, but it would consistently happen for the same shapes. It might eventually recover, but there was no telling how long or how many refreshes it would take – I now believe it may be that older keys are eventually replaced in localStorage per LRU.

When we patch @electric-sql/client with the below change, it recovers much more quickly.

One more potentially important factor that might have contributed to the problem in the first place and/or the lack of automatic recovery – we unintentionally had void return for non-http errors in onError, which meant they would not retry. After deploying the above patch and changing the below to return {}, we see self-healing work pretty quickly.

image

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 2, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 16f9ccf
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69ced842f0267200089611ed
😎 Deploy Preview https://deploy-preview-4085--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@evanob
Copy link
Copy Markdown
Contributor Author

evanob commented Apr 8, 2026

closing in favour of #4087

@evanob evanob closed this Apr 8, 2026
KyleAMathews added a commit that referenced this pull request Apr 10, 2026
…#4087)

## Summary

Expired shape handle entries in localStorage can get permanently stuck,
preventing data from ever loading for affected shapes. This adds a
self-healing retry mechanism that clears the poisoned entry and retries
once, allowing automatic recovery even when a proxy strips cache-buster
query parameters.

Based on #4085 by @evan-liveflow — refined with additional hardening
from code review.

## Root Cause

When a shape gets a 409 (handle rotation), the client stores the old
handle in `localStorage['electric_expired_shapes']`. On future requests,
if a response contains that handle, the client treats it as a stale
cached response and retries up to 3 times with cache-buster params.

The problem: if a proxy (e.g., phoenix_sync) strips query parameters,
the cache busters are ineffective. All 3 retries fail, `FetchError(502)`
is thrown to `onError`, and if `onError` doesn't retry, the stream dies.
The expired entry persists in localStorage, so the next session hits the
same wall — permanently.

Since the server never reuses handles (now documented as **SPEC.md
S0**), the expired entry becomes a false positive once the caching layer
clears — but the client has no way to discover this.

## Approach

After stale cache retries exhaust (3 attempts), the client now:

1. **Always clears the expired entry** from localStorage — if cache
busters didn't work, keeping the entry only poisons future sessions
2. **Attempts one self-healing retry** — resets the stream and retries
without the `expired_handle` param. Since handles are never reused, the
fresh response will have a new handle and won't trigger stale detection
3. **Guards against infinite loops** via `#expiredShapeRecoveryKey`
(once per shape key, reset on up-to-date)

```typescript
if (transition.exceededMaxRetries) {
  if (shapeKey) {
    expiredShapesCache.delete(shapeKey)       // always clear
    if (this.#expiredShapeRecoveryKey !== shapeKey) {
      this.#expiredShapeRecoveryKey = shapeKey // remember we tried
      this.#reset()                            // fresh start
      throw new StaleCacheError(...)           // caught internally → retry
    }
  }
  throw new FetchError(502, ...)               // truly give up
}
```

### Key Invariants

- **S0**: Server handles are unique and never reused (phash2 +
microsecond timestamp, SQLite UNIQUE INDEX, ETS insert_new)
- Self-healing fires at most once per shape per retry cycle
(`#expiredShapeRecoveryKey` guard)
- Guard resets on up-to-date, so long-lived streams can self-heal again
if CDN misbehaves later
- Expired entry is cleared on every exhaustion, regardless of whether
self-healing fires

### Non-goals

- TTL on expired cache entries — the self-healing mechanism handles the
failure mode without added complexity
- Changing `onError` contract — the fix works regardless of what the
user's `onError` callback does

## Verification

```bash
cd packages/typescript-client
pnpm vitest run --config vitest.unit.config.ts
# 312 tests pass
pnpm exec tsc --noEmit
# Clean
```

## Files changed

| File | Change |
|------|--------|
| `src/client.ts` | Self-healing logic in `#onInitialResponse`, recovery
key cleared on up-to-date, updated catch block comment |
| `test/expired-shapes-cache.test.ts` | Updated 2 existing tests for
self-healing flow, added test for CDN-always-stale scenario |
| `SPEC.md` | Added S0 (handle uniqueness guarantee), updated L3
loop-back entry and guard table |
| `.changeset/fix-expired-shapes-self-healing.md` | Changeset for patch
release |

---
Based on #4085

---------

Co-authored-by: Evan O'Brien <evan@liveflow.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant