Skip to content

feat: use shard sync for gained shards during node recovery epoch change#3464

Open
halfprice wants to merge 2 commits into
mainfrom
zhewu/node_recovery_across_epoch_change
Open

feat: use shard sync for gained shards during node recovery epoch change#3464
halfprice wants to merge 2 commits into
mainfrom
zhewu/node_recovery_across_epoch_change

Conversation

@halfprice

@halfprice halfprice commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

When a node in RecoveryInProgress processes an epoch change, it force-sets all owned shards (including newly gained ones) to Active and fills them via per-blob erasure recovery — even though on an incremental transition the previous owner is known and bulk shard sync is far cheaper.

This PR adds a dedicated epoch-change path for recovering nodes:

  • Gained shards are filled via shard sync from their previous owners instead of per-blob recovery. The catch-up path (previous shard assignment unknown) is unchanged.
  • Node recovery waits for ongoing shard syncs before each scan pass and before completing, using a watch counter of running sync tasks (abort/panic-safe; a terminally failed sync does not block recovery).
  • Single epoch_sync_done attestation: while recovering, only the recovery task attests, and only after both blob recovery and all shard syncs finish. Shard sync's own attestation is suppressed in that state.

A follow-up PR will make the recovery task survive epoch changes without restarting (avoiding the full blob-info re-scan).

Test plan

  • New simtest test_node_recovery_across_epoch_change_with_shard_gain: a node gains shards at an epoch change processed while recovering; asserts the gained shards are filled by shard sync, recovery starts no blob syncs while shard syncs run, and the node ends Active with all shards Ready. Passes on seeds 1 and 2.
  • Regression simtests pass: test_long_node_recovery, test_recovery_in_progress_with_node_restart, test_lagging_node_recovery.

When a node in RecoveryInProgress processes an epoch change, it
previously force-set all owned shards (including newly gained ones) to
Active and filled them via per-blob erasure recovery, even though on an
incremental transition the previous owner is known and bulk shard sync
is far cheaper. Worse, every recovery blob sync decodes slivers for all
shards owned at the latest epoch, so gained shards were redundantly
erasure-decoded while also being recoverable from their previous owner.

This change adds a dedicated epoch-change path for recovering nodes:

- Newly gained shards are created and filled using shard sync from
  their previous owners; outgoing shards are locked as usual. The
  catch-up recovery path (where the previous epoch's shard assignment
  is unknown) is unchanged.
- Node recovery waits for all ongoing shard syncs to finish before
  scanning for blobs to recover (and again before completing), since a
  gained shard is missing all blobs and shard sync fills it with the
  cheapest mechanism. The wait is based on a watch counter of running
  sync tasks maintained by an RAII guard, so aborted or terminally
  failed syncs do not block recovery; a terminally failed sync leaves
  its shard in ActiveSync and the blobs are recovered through the
  regular blob recovery path.
- While the node is recovering, the epoch sync done attestation is
  owned solely by the recovery task, which attests only once both blob
  recovery and all shard syncs are complete; shard sync's own
  attestation is suppressed in that state.

A new simtest crashes a node into RecoveryInProgress, holds its
recovery task with a fail point, stakes additional weight so the node
gains shards at an epoch change processed while recovering, and
verifies that the gained shards are filled by shard sync, that recovery
starts no blob syncs while shard syncs run, and that the node ends up
Active with all shards ready.
@halfprice halfprice force-pushed the zhewu/node_recovery_across_epoch_change branch from a522003 to 37d0723 Compare June 12, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant