Skip to content

feat: add cross-node consistency check for pooled per-object blob info#3490

Open
halfprice wants to merge 1 commit into
mainfrom
zhewu/pooled_blob_object_consistency
Open

feat: add cross-node consistency check for pooled per-object blob info#3490
halfprice wants to merge 1 commit into
mainfrom
zhewu/pooled_blob_object_consistency

Conversation

@halfprice

@halfprice halfprice commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extends the storage-node background consistency check to cover the per-object pooled blob info table, mirroring the existing certified per-object blob digest check. At each epoch change, every node computes an xxhash digest over its certified pooled per-object entries; the simtest harness then asserts all nodes agree, surfacing any cross-node divergence.

Test plan

  • cargo check -p walrus-service — clean.
  • Full msim build of walrus-simtest — clean.
  • MSIM_TEST_SEED=1 cargo simtest test_storage_pool — all 11 storage-pool simtests pass.

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

  • Storage node:
  • Aggregator:
  • Publisher:
  • CLI:

Mirror the certified per-object blob digest consistency check for the
per-object pooled blob info table:

- Implement `CertifiedBlobInfoApi` for `PerObjectPooledBlobInfo` so the
  `BlobInfoIter` "certified before epoch" filter works over the pooled
  table. A pooled blob entry is certified iff it is present (deletion
  removes the entry) and its certified epoch is at or before the epoch.
- Add `PerObjectPooledBlobInfoIterator` plus the
  `certified_per_object_pooled_blob_info_iter_before_epoch` builders on
  `BlobInfoTable` and `Storage`.
- Add three pooled consistency-check metrics.
- Make `compose_blob_object_list_digest` generic over any ObjectID-keyed
  table, and add `compose_certified_pooled_object_blob_list_digest` with
  its own `storage_node_certified_pooled_blob_object_digest` fail point.
  Schedule it alongside the existing per-object check at epoch change.
- Wire the new fail point into the simtest `BlobInfoConsistencyCheck` so
  every storage-pool simtest verifies pooled per-object digests match
  across all nodes.
@halfprice

Copy link
Copy Markdown
Collaborator Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the storage-node epoch-change background consistency checks to also compute and compare a cross-node digest for the per-object pooled blob info table, mirroring the existing per-object blob info digest check. This enables simtests to detect pooled per-object table divergence across nodes.

Changes:

  • Add CertifiedBlobInfoApi support for PerObjectPooledBlobInfo so it can be filtered/scanned consistently with other certified-blob iterators.
  • Add storage/table iterators and background digest computation + metrics for certified pooled per-object entries.
  • Extend the simtest harness to collect and assert cross-node equality for the pooled per-object digest via a new failpoint.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/walrus-simtest/src/test_utils.rs Registers a new failpoint + digest map and adds a pooled per-object digest cross-node consistency assertion.
crates/walrus-service/src/node/storage/blob_info/per_object_pooled_blob_info.rs Implements CertifiedBlobInfoApi for pooled per-object blob info entries.
crates/walrus-service/src/node/storage/blob_info.rs Adds pooled per-object iterator types and a “certified before epoch” iterator for the pooled per-object table.
crates/walrus-service/src/node/storage.rs Exposes a storage-level iterator for certified pooled per-object entries (before epoch).
crates/walrus-service/src/node/metrics.rs Adds Prometheus metrics for pooled per-object digest, errors, and scanned count.
crates/walrus-service/src/node/consistency_check.rs Computes and publishes pooled per-object digest during epoch-change background consistency checks (plus simtest failpoint hook).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
.set(blob_object_list_digest as i64);

// No-op out side of simtest.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f37f32e02

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +578 to +579
let blob_object_list_digest = match compose_blob_object_list_digest(
per_object_pooled_blob_info_iter,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash pooled row values in the new digest

This new pooled check uses compose_blob_object_list_digest, but that helper only feeds blob_info.0 (the ObjectID) into the hasher, so two nodes with the same pooled object IDs but different PerObjectPooledBlobInfo values still report identical digests. In scenarios where blob_id, storage_pool_id, or certified_epoch diverges, later GC/aggregate updates can behave differently while the simtest consistency check passes; include a stable serialization of the pooled value, or at least the material fields, in this digest.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the pattern for other consistency checks as well and only the ObjectID set is relevant.

@shuowang12 shuowang12 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for extending it to cover pooled blobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants