feat: [Do not merge] blob info checkpoint creation by shuowang12 · Pull Request #3460 · MystenLabs/walrus

shuowang12 · 2026-06-11T17:35:29Z

Description

#deploying a custom branch to walrus-private-testnet

This draft PR is for deployment to the PTN storage nodes as part of the blob info snapshot. The branch is main + one isolated, config-gated feature: when enabled, a node creates a RocksDB checkpoint of its database at the epoch boundary (post-GC, the deterministic point) and deletes the previous one. I'll then run an offline db-tool against each node's checkpoint to verify the serialized blob info snapshots are byte-identical across nodes.

Rollout plan, in stages:

Binary to all nodes via the standard PTN deploy workflow (no wipe). The feature is off by default, so this step changes no behavior.
Enable on 1–2 nodes first via the PTN config-update workflow (blob_info_snapshot.enabled: true with a host limit), and watch them across an epoch boundary or two.
Enable fleet-wide once the canary nodes look clean.

Expected impact:

No changes to event processing, blob lifecycle, or APIs. Checkpoint errors are log-and-count only and cannot fail epoch processing.
When enabled: a pause of event processing for the checkpoint duration (expected seconds) once per 2-hour epoch, and one retained checkpoint per node (hard links, small incremental disk).
Verification runs are read-only against the checkpoint directory, never the live DB.

Test plan

How did you test the new or updated feature?

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

Storage node:
Aggregator:
Publisher:
CLI:

github-actions · 2026-06-11T18:57:33Z

Warning: This PR modifies one of the example config files. Please consider the
following:

Make sure the changes are backwards compatible with the current configuration.
Make sure any added parameters follow the conventions of the existing parameters; in
particular, durations should take seconds or milliseconds using the naming convention
_secs or _millis, respectively.
If there are added optional parameter sections, it should be possible to specify them
partially. A useful pattern there is to implement Default for the struct and derive
#[serde(default)] on it, see BlobRecoveryConfig as an example.
You may need to update the documentation to reflect the changes.

Add a versioned, deterministic serialization format for the blob info tables that are identical across nodes after GC phase 1 at the epoch boundary: per_object_blob_info, per_object_pooled_blob_info, and storage_pool_info. The aggregate_blob_info table is excluded by design: it contains node-local state (is_metadata_stored) and entries whose deletion timing depends on background GC phase 2, so it is not deterministic across nodes and is instead reconstructed during recovery. The format is self-delimiting (single-pass write and read), carries the epoch and exact event-stream cursor in its header, reserves chunking fields for snapshots exceeding the maximum blob size, and ends with an xxhash64 checksum. A golden-byte test pins the v1 serialization so that any byte-level change (which is consensus-critical, since all nodes must produce bit-identical snapshots) fails CI until the format version is bumped. This is the first step towards storage node recovery for storage pools (WAL-1185): pool membership cannot be recovered through event replay, so recovering nodes will bootstrap from a quorum-certified snapshot blob and replay events forward. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Add `walrus-node db-tool bench-blob-info-snapshot`, which serializes the per_object_blob_info, per_object_pooled_blob_info, and storage_pool_info column families of an existing node database into a snapshot file and reports: entry counts, snapshot size and bytes per entry, serialize + write + sync duration, read + deserialize duration, bulk-load duration into a scratch database (using the same key/value encodings as a real node), and zstd compression ratios at configurable levels. The database is opened read-only with only the three relevant column families, so the benchmark can run against a stopped node's database or a copy of a running node's database. This provides production-scale measurements for the snapshot design (size, serialization cost, compression benefit) before any node deployment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The xxhash64 trailer is a deterministic fingerprint of the snapshotted table contents, so printing it lets operators compare snapshots taken at the same epoch boundary across nodes with a single line of output. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

When enabled (default off), the node creates a RocksDB checkpoint of its database at the epoch boundary, directly after GC phase 1 has settled the blob info tables and before any further events are processed, and removes older checkpoints so that at most one exists at a time. Startup finishes any cleanup that a crash interrupted, so a stale checkpoint cannot pin deleted SST files' disk space for a full epoch. The node does not serialize anything itself: operators verify snapshot determinism offline by running `walrus-node db-tool bench-blob-info-snapshot --db-path <checkpoint>` on each node and comparing the reported digests for the same epoch. Checkpoint creation duration is reported as a metric; failures increment an error counter and never fail epoch processing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tency check The checkpoint takes seconds (memtable flush plus hard links) while the consistency check's background scan reads the whole table for minutes or longer. Running the checkpoint first keeps the scan's disk traffic from stretching the inline checkpoint duration; determinism is unaffected since both capture their state while event processing is blocked. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Parse the epoch out of a checkpoint_epoch_<N> input directory and embed it in the snapshot header, so digests only match across snapshots taken at the same epoch boundary. The event cursor deliberately stays at its default: the cursor stored in the database is not deterministic at the checkpoint instant because event completion is marked by a background task. Also compute the event-reprocessing guard before execute_epoch_change spawns the finisher task, which marks the event complete in the background and could otherwise misclassify normal processing as reprocessing, skipping the checkpoint and consistency check. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

shuowang12 changed the title ~~[Do not merge] feat:blob info checkpoint creation~~ feat: [Do not merge] blob info checkpoint creation Jun 11, 2026

shuowang12 and others added 8 commits June 11, 2026 20:39

docs: clarify valid database inputs for the snapshot benchmark

d8216bb

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

chore: regenerate the example node config

b229d8b

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

shuowang12 force-pushed the shuo/blob-info-checkpoint-minimal branch from d05dd43 to ba872e5 Compare June 12, 2026 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: [Do not merge] blob info checkpoint creation#3460

feat: [Do not merge] blob info checkpoint creation#3460
shuowang12 wants to merge 8 commits into
mainfrom
shuo/blob-info-checkpoint-minimal

shuowang12 commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shuowang12 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Release notes

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shuowang12 commented Jun 11, 2026 •

edited

Loading