feat: add cross-node consistency check for pooled per-object blob info#3490
Conversation
Mirror the certified per-object blob digest consistency check for the per-object pooled blob info table: - Implement `CertifiedBlobInfoApi` for `PerObjectPooledBlobInfo` so the `BlobInfoIter` "certified before epoch" filter works over the pooled table. A pooled blob entry is certified iff it is present (deletion removes the entry) and its certified epoch is at or before the epoch. - Add `PerObjectPooledBlobInfoIterator` plus the `certified_per_object_pooled_blob_info_iter_before_epoch` builders on `BlobInfoTable` and `Storage`. - Add three pooled consistency-check metrics. - Make `compose_blob_object_list_digest` generic over any ObjectID-keyed table, and add `compose_certified_pooled_object_blob_list_digest` with its own `storage_node_certified_pooled_blob_object_digest` fail point. Schedule it alongside the existing per-object check at epoch change. - Wire the new fail point into the simtest `BlobInfoConsistencyCheck` so every storage-pool simtest verifies pooled per-object digests match across all nodes.
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR extends the storage-node epoch-change background consistency checks to also compute and compare a cross-node digest for the per-object pooled blob info table, mirroring the existing per-object blob info digest check. This enables simtests to detect pooled per-object table divergence across nodes.
Changes:
- Add
CertifiedBlobInfoApisupport forPerObjectPooledBlobInfoso it can be filtered/scanned consistently with other certified-blob iterators. - Add storage/table iterators and background digest computation + metrics for certified pooled per-object entries.
- Extend the simtest harness to collect and assert cross-node equality for the pooled per-object digest via a new failpoint.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/walrus-simtest/src/test_utils.rs | Registers a new failpoint + digest map and adds a pooled per-object digest cross-node consistency assertion. |
| crates/walrus-service/src/node/storage/blob_info/per_object_pooled_blob_info.rs | Implements CertifiedBlobInfoApi for pooled per-object blob info entries. |
| crates/walrus-service/src/node/storage/blob_info.rs | Adds pooled per-object iterator types and a “certified before epoch” iterator for the pooled per-object table. |
| crates/walrus-service/src/node/storage.rs | Exposes a storage-level iterator for certified pooled per-object entries (before epoch). |
| crates/walrus-service/src/node/metrics.rs | Adds Prometheus metrics for pooled per-object digest, errors, and scanned count. |
| crates/walrus-service/src/node/consistency_check.rs | Computes and publishes pooled per-object digest during epoch-change background consistency checks (plus simtest failpoint hook). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f37f32e02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
shuowang12
left a comment
There was a problem hiding this comment.
LGTM! Thanks for extending it to cover pooled blobs.
Mirror the simtest wiring from #3490 into the Antithesis cross-node invariant observer. The observer scrapes Prometheus metrics from all storage nodes and crashes on cross-node divergence, which Antithesis surfaces as a test failure. Add walrus_per_object_pooled_blob_info_consistency_check as a new hard invariant alongside the regular per-object check: same epoch-bucket saturation guard, same no-data silent-regression guard, and crash on mismatched digests across nodes.
Summary
Extends the storage-node background consistency check to cover the per-object pooled blob info table, mirroring the existing certified per-object blob digest check. At each epoch change, every node computes an xxhash digest over its certified pooled per-object entries; the simtest harness then asserts all nodes agree, surfacing any cross-node divergence.
Resolve WAL-1177
Test plan
cargo check -p walrus-service— clean.walrus-simtest— clean.MSIM_TEST_SEED=1 cargo simtest test_storage_pool— all 11 storage-pool simtests pass.Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)