Skip to content

feat: serialize blob info snapshot in-process at the epoch boundary#3480

Open
shuowang12 wants to merge 1 commit into
mainfrom
shuo/blob-info-snapshot-minimal-main
Open

feat: serialize blob info snapshot in-process at the epoch boundary#3480
shuowang12 wants to merge 1 commit into
mainfrom
shuo/blob-info-snapshot-minimal-main

Conversation

@shuowang12

@shuowang12 shuowang12 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds opt-in serialization of the blob info snapshot at the epoch boundary, behind blob_info_snapshot.enabled (off by default).

When enabled, the storage node serializes the three blob-info column families (per_object_blob_info, storage_pool_info, per_object_pooled_blob_info) to a local file once per epoch and reports the serialization duration, size, and content digest. The node does not read snapshots back yet or use it in anyway.

The serialization runs after GC phase 1 settles the tables and before execute_epoch_change, while event processing is blocked, so the serialized state is the deterministic post-GC point. A crash before completion replays the handler and retries to create the snapshot and will short-circuits if the file already exists.

The encoding format is versioned and self-delimiting: magic, version, a BCS header { epoch, event_cursor }, then three count-prefixed sections of length-framed (ObjectID, value) entries in ascending key order.

Test plan

  • Unit: determinism (independent instances serialize identically), golden byte-stability, magic/version prefix, ascending-key enforcement.
  • Offline, on real data (db-tool on a separate branch): serialized testnet (262K entries) and two PTN nodes' checkpoints; entry counts match the source tables and the output is byte-identical across the two nodes.
  • decode-encode roundtrip passes in local experiments.

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

  • Storage node:
  • Aggregator:
  • Publisher:
  • CLI:

@github-actions

Copy link
Copy Markdown
Contributor

Warning: This PR modifies one of the example config files. Please consider the
following:

  • Make sure the changes are backwards compatible with the current configuration.
  • Make sure any added parameters follow the conventions of the existing parameters; in
    particular, durations should take seconds or milliseconds using the naming convention
    _secs or _millis, respectively.
  • If there are added optional parameter sections, it should be possible to specify them
    partially. A useful pattern there is to implement Default for the struct and derive
    #[serde(default)] on it, see BlobRecoveryConfig as an example.
  • You may need to update the documentation to reflect the changes.

@shuowang12 shuowang12 force-pushed the shuo/blob-info-snapshot-minimal-main branch 13 times, most recently from ab81bd4 to 66fa63b Compare June 26, 2026 20:48
@shuowang12 shuowang12 force-pushed the shuo/blob-info-snapshot-minimal-main branch from 66fa63b to fbeb651 Compare June 26, 2026 22:13
@shuowang12 shuowang12 marked this pull request as ready for review June 26, 2026 22:17
@shuowang12

Copy link
Copy Markdown
Collaborator Author

I have some review suggestions. cc: @halfprice @sadhansood

The main goal if the PR is to start writing snapshot files on a few mainnet nodes to gather metrics (serialize time, size, cross-node digest).

I expect the review focus to be on safety and durability.
Safety: write snapshot files correctly and durably, never fails epoch processing or corrupts existing db layout.

Open but not blocking: the encoding format (blob_info_snapshot.rs). I polished the encoding format and tried several variants back and forth and now it is in good shape and welcome to review. It should be ready for future milestones but still flexible to change as we have not started it using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant