feat: use shard sync for gained shards during node recovery epoch change#3464
Open
halfprice wants to merge 2 commits into
Open
feat: use shard sync for gained shards during node recovery epoch change#3464halfprice wants to merge 2 commits into
halfprice wants to merge 2 commits into
Conversation
When a node in RecoveryInProgress processes an epoch change, it previously force-set all owned shards (including newly gained ones) to Active and filled them via per-blob erasure recovery, even though on an incremental transition the previous owner is known and bulk shard sync is far cheaper. Worse, every recovery blob sync decodes slivers for all shards owned at the latest epoch, so gained shards were redundantly erasure-decoded while also being recoverable from their previous owner. This change adds a dedicated epoch-change path for recovering nodes: - Newly gained shards are created and filled using shard sync from their previous owners; outgoing shards are locked as usual. The catch-up recovery path (where the previous epoch's shard assignment is unknown) is unchanged. - Node recovery waits for all ongoing shard syncs to finish before scanning for blobs to recover (and again before completing), since a gained shard is missing all blobs and shard sync fills it with the cheapest mechanism. The wait is based on a watch counter of running sync tasks maintained by an RAII guard, so aborted or terminally failed syncs do not block recovery; a terminally failed sync leaves its shard in ActiveSync and the blobs are recovered through the regular blob recovery path. - While the node is recovering, the epoch sync done attestation is owned solely by the recovery task, which attests only once both blob recovery and all shard syncs are complete; shard sync's own attestation is suppressed in that state. A new simtest crashes a node into RecoveryInProgress, holds its recovery task with a fail point, stakes additional weight so the node gains shards at an epoch change processed while recovering, and verifies that the gained shards are filled by shard sync, that recovery starts no blob syncs while shard syncs run, and that the node ends up Active with all shards ready.
a522003 to
37d0723
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When a node in
RecoveryInProgressprocesses an epoch change, it force-sets all owned shards (including newly gained ones) toActiveand fills them via per-blob erasure recovery — even though on an incremental transition the previous owner is known and bulk shard sync is far cheaper.This PR adds a dedicated epoch-change path for recovering nodes:
epoch_sync_doneattestation: while recovering, only the recovery task attests, and only after both blob recovery and all shard syncs finish. Shard sync's own attestation is suppressed in that state.A follow-up PR will make the recovery task survive epoch changes without restarting (avoiding the full blob-info re-scan).
Test plan
test_node_recovery_across_epoch_change_with_shard_gain: a node gains shards at an epoch change processed while recovering; asserts the gained shards are filled by shard sync, recovery starts no blob syncs while shard syncs run, and the node endsActivewith all shardsReady. Passes on seeds 1 and 2.test_long_node_recovery,test_recovery_in_progress_with_node_restart,test_lagging_node_recovery.