Skip to content

feat: create blob info snapshot checkpoints at the epoch boundary#3468

Draft
shuowang12 wants to merge 1 commit into
releases/walrus-v1.50.0-releasefrom
shuo/blob-info-checkpoint-testnet-v1.50.0
Draft

feat: create blob info snapshot checkpoints at the epoch boundary#3468
shuowang12 wants to merge 1 commit into
releases/walrus-v1.50.0-releasefrom
shuo/blob-info-checkpoint-testnet-v1.50.0

Conversation

@shuowang12

Copy link
Copy Markdown
Collaborator

When enabled via the new blob_info_snapshot config section (off by default), the storage node creates a RocksDB checkpoint of its database at the epoch boundary, directly after garbage-collection phase 1 and before any further events are processed, and removes older checkpoints so at most one exists at a time. Startup finishes any cleanup that was interrupted by a crash, so a stale checkpoint cannot pin deleted SST files for a full epoch.

At that boundary point the blob info tables are identical across all honest nodes, so the checkpoints allow verifying blob info determinism across nodes offline. Checkpoint errors are logged and counted but never fail epoch processing.

Description

Describe the changes or additions included in this PR.

Test plan

How did you test the new or updated feature?


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

  • Storage node:
  • Aggregator:
  • Publisher:
  • CLI:

When enabled via the new `blob_info_snapshot` config section (off by
default), the storage node creates a RocksDB checkpoint of its database
at the epoch boundary, directly after garbage-collection phase 1 and
before any further events are processed, and removes older checkpoints
so at most one exists at a time. Startup finishes any cleanup that was
interrupted by a crash, so a stale checkpoint cannot pin deleted SST
files for a full epoch.

At that boundary point the blob info tables are identical across all
honest nodes, so the checkpoints allow verifying blob info determinism
across nodes offline. Checkpoint errors are logged and counted but never
fail epoch processing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Warning: This PR modifies one of the example config files. Please consider the
following:

  • Make sure the changes are backwards compatible with the current configuration.
  • Make sure any added parameters follow the conventions of the existing parameters; in
    particular, durations should take seconds or milliseconds using the naming convention
    _secs or _millis, respectively.
  • If there are added optional parameter sections, it should be possible to specify them
    partially. A useful pattern there is to implement Default for the struct and derive
    #[serde(default)] on it, see BlobRecoveryConfig as an example.
  • You may need to update the documentation to reflect the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant