feat(metrics): expose prefix bloom skip counter for testability by polaz · Pull Request #70 · structured-world/coordinode-lsm-tree

polaz · 2026-03-22T08:21:20Z

Summary

Add prefix_bloom_skips counter to Metrics struct (feature-gated under metrics), incremented when maybe_contains_prefix returns Ok(false) in TreeIter::create_range
Counter covers both single-table and multi-table run paths
Wire Option<Arc<Metrics>> through IterState so Tree::create_prefix tracks skips on the tree's shared metrics instance without allocating for non-prefix iterators
Add two #[cfg(feature = "metrics")] integration tests validating the counter increments and stays zero without an extractor
Note: BlobTree prefix scans do not currently record this metric

Test Plan

cargo test — all tests pass without metrics feature
cargo test --features metrics — all 18 prefix bloom tests pass including 2 metrics tests
cargo clippy --all-features — zero warnings

Closes #59

coderabbitai · 2026-03-22T08:21:26Z

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8f49e9e-5cf9-4c1d-a3b1-4ccd4082fa91

📥 Commits

Reviewing files that changed from the base of the PR and between 50bfa64 and 46cd2d0.

📒 Files selected for processing (4)

src/metrics.rs
src/range.rs
src/tree/mod.rs
tests/tree_prefix_bloom.rs

📝 Walkthrough

Walkthrough

This change introduces a new metric counter to track prefix bloom filter skips—instances where a prefix bloom filter returns false, indicating a segment lacks the target prefix. The counter is exposed via public API, and the metrics infrastructure is conditionally threaded through the prefix-scan code path via IterState, with validation tests confirming the metric increments correctly during filtered prefix scans.

Changes

Cohort / File(s)	Summary
Metrics Infrastructure `src/metrics.rs`	Added new atomic counter `prefix_bloom_skips` and public accessor method `prefix_bloom_skips()` to track bloom filter rejections during prefix scans.
Iterator State & Range `src/range.rs`	Added optional `metrics: Option<Arc<crate::Metrics>>` field to `IterState` (feature-gated). Increments `prefix_bloom_skips` when `table.maybe_contains_prefix()` returns `Ok(false)` during prefix scan initialization.
Tree Iterator Initialization `src/tree/mod.rs`	Updated `TreeIter` to populate the `metrics` field in `IterState` during prefix-scan creation. Sets `metrics: Some(...)` for prefix scans and `metrics: None` for general range iterators.
Prefix Bloom Tests `tests/tree_prefix_bloom.rs`	Added two metrics-gated test functions: `prefix_bloom_skip_metrics()` validates that bloom skips are counted during prefix scans with extractor boundaries; `prefix_bloom_skip_metrics_zero_without_extractor()` confirms the metric remains zero when no prefix extractor is configured.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

feat: prefix bloom filters for graph key encoding #43: Introduces prefix bloom filtering logic that this PR extends by adding observability metrics to the same code paths (table.maybe_contains_prefix() invocations in src/range.rs).
refactor(metrics): add dedicated range tombstone block counters #36: Extends the Metrics struct with new atomic counters and accessors, following the same pattern as this change's metric addition.

Poem

🐰 A hop through metrics, bright and keen,
Bloom filters skip what shouldn't be seen—
Now counters count each crafty skip,
Tests verify each data-sip,
Transparency blooms in every bit! 🌸

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(metrics): expose prefix bloom skip counter for testability' clearly summarizes the main change—adding a metric counter to make prefix bloom skips observable in tests.
Linked Issues check	✅ Passed	All coding requirements from issue `#59` are met: the prefix_bloom_skips counter is added to Metrics, incremented when maybe_contains_prefix returns Ok(false), and exposed via a public accessor so integration tests can assert the metric.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#59`: adding the counter, wiring metrics through IterState, and adding feature-gated tests. No unrelated modifications were introduced.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#59-feat-expose-prefix-bloom-skip-metrics-for-testabil

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-22T08:26:39Z

Codecov Report

❌ Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/range.rs	92.85%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

polaz · 2026-03-22T08:29:26Z

@coderabbitai re-review

coderabbitai · 2026-03-22T08:29:37Z

@polaz Sure, I'll re-review the changes in PR #70 right away!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to issue `#58` (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until issue `#58` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/tree/mod.rs:1045-1063
Timestamp: 2026-03-21T17:23:51.325Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs` and `src/blob_tree/mod.rs`), the prefix bloom filter boundary guard `extractor.prefixes(prefix_bytes).any(|p| p == prefix_bytes)` is intentionally used instead of a dedicated `is_valid_scan_boundary()` trait method. The `PrefixExtractor` contract requires `prefixes()` to return sub-slices of the input key, so a well-behaved extractor that emits `prefix` for `prefix` itself will also emit it for any key starting with `prefix`. Adding `is_valid_scan_boundary()` to the public trait was deferred to issue `#47` to avoid API complexity for a theoretical misuse case. Do not flag the current predicate as insufficient in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to a separate issue (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree`, prefix bloom filter skipping for multi-table runs (L0, served by `RunReader` in `src/run_reader.rs`) is intentionally deferred from PR `#43` to issue `#57`. The `run.len() == 1` guard in `src/range.rs` is deliberate. Do not flag the absence of multi-table run prefix bloom skipping as a bug in PR `#43` or future PRs until issue `#57` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/range.rs:226-252
Timestamp: 2026-03-21T22:11:05.425Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping intentionally applies **only** to single-table runs (`run.len() == 1`). Multi-table runs (typically L0) are handled by `RunReader`, which merges across all tables in the run and does not support per-table prefix bloom filtering. This is an architectural limitation documented in the code — do not flag the absence of prefix bloom skipping for multi-table runs as a bug or missing optimization in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:09.063Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

This PR adds a new metrics counter to make prefix-bloom segment skipping observable (and therefore testable) from integration tests when the metrics feature is enabled.

Changes:

Add a prefix_bloom_skips counter to Metrics, incremented when maybe_contains_prefix returns Ok(false).
Thread an Arc<Metrics> through IterState so prefix scans can record skip events on the tree’s shared metrics instance.
Add #[cfg(feature = "metrics")] integration tests asserting the counter behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/metrics.rs`	Adds the `prefix_bloom_skips` counter and a public accessor.
`src/range.rs`	Increments the new counter when a segment is skipped via prefix bloom `Ok(false)`.
`src/tree/mod.rs`	Wires `Arc<Metrics>` into iterator state (shared for `create_prefix`).
`tests/tree_prefix_bloom.rs`	Adds `metrics`-gated integration tests for the new counter.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

polaz · 2026-03-22T09:50:01Z

@coderabbitai re-review

coderabbitai · 2026-03-22T09:51:41Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the key changed files
cat src/metrics.rs
echo "=== src/range.rs ==="
cat src/range.rs
echo "=== src/tree/mod.rs relevant sections ==="
# Look for create_prefix and create_internal_range_with_prefix_hash
rg -n "create_prefix|create_internal_range|IterState|metrics" src/tree/mod.rs -A 5 -B 2
echo "=== tests/tree_prefix_bloom.rs ==="
cat tests/tree_prefix_bloom.rs

Length of output: 57941

@polaz The implementation looks clean and correct. Here's a full review of all changed files:

`src/metrics.rs`

prefix_bloom_skips: AtomicUsize is correctly placed and scoped pub(crate), consistent with every other field in the struct.
The accessor pub fn prefix_bloom_skips(&self) -> usize uses Relaxed ordering, matching all other accessors — appropriate for a monotonic counter that doesn't need to synchronize with other memory operations.
Both the field and accessor doc comments correctly note the BlobTree limitation: "Note: BlobTree prefix scans do not currently record this metric." — good documentation of the known gap.

`src/range.rs`

The #[cfg(feature = "metrics")] pub(crate) metrics: Option<Arc<crate::Metrics>> field on IterState compiles correctly with self_cell!.
The increment fires only in the Ok(false) arm — exactly right. Ok(true) and Err(_) (conservative include) do not increment.
The double guard #[cfg(feature = "metrics")] if let Some(m) = &lock.metrics is correct: the outer cfg eliminates the code entirely in non-metrics builds; the inner Option check avoids a metrics allocation for non-prefix iterators.

`src/tree/mod.rs`

create_internal_range_with_prefix_hash → metrics: None: correct. This path serves general range scans that should not record prefix bloom skips.
create_prefix → metrics: Some(self.0.metrics.clone()): correct. This is the only path that should record the metric.
The Option<Arc<Metrics>> approach (commit 2) avoiding a clone for non-prefix iterators is a good design decision.

`tests/tree_prefix_bloom.rs`

prefix_bloom_skip_metrics: The 1000-key (500 "aaa:" + 500 "zzz:") single-table setup is well-chosen — it creates a wide key range so that all 24 non-existent prefixes pass the key-range guard but hit the bloom. The table_count() == 1 assertion is important to ensure the single-table fast-path is exercised. The probabilistic rationale in the comment (24 scans at ~1% FPR → P(0 skips) ≈ 10⁻⁴⁸) is sound.
prefix_bloom_skip_metrics_zero_without_extractor: Fully deterministic — without a prefix extractor, prefix_hash is None, so the bloom check is never reached. Counter stays at 0 by construction.
Both tests assert the initial counter value before performing scans — good test hygiene.

No issues found. The PR closes #59 cleanly with minimal, well-scoped changes. The metric is correctly feature-gated, the counter increment is in exactly the right place, and the tests cover both the positive (skips > 0) and negative (no extractor → 0 skips) cases.

[approve]

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:09.063Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:29:48.199Z
Learning: In `structured-world/lsm-tree` (`src/prefix.rs`), `is_valid_scan_boundary(&self, prefix: &[u8]) -> bool` has been added to the `PrefixExtractor` trait in PR `#68` (closes issue `#47`), replacing the previously internal `is_valid_prefix_boundary` name. The method has a default implementation that checks `prefixes(prefix).any(|p| p == prefix)`, correct for sub-slice extractors, and can be overridden for O(1) checks (e.g., `FixedLengthPrefix`) or for extractors whose `prefixes()` does not return sub-slices. `compute_prefix_hash` in `src/prefix.rs` calls `is_valid_scan_boundary` as the sole call site. Do not flag the absence of `is_valid_scan_boundary` or suggest adding it in future PRs — it is already part of the public API.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 23
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-17T00:34:54.145Z
Learning: In `structured-world/lsm-tree`, the lint suppression policy distinguishes between legacy and new code:
- Existing/upstream `#[allow(clippy::...)]` in **unchanged** code: treat as legacy — do **not** flag it in reviews.
- New code introduced in a PR **must** use `#[expect(clippy::...)]` with a justification reason.
This avoids churning upstream code while enforcing the `#[expect]` standard for all new contributions.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Add `prefix_bloom_skips` counter to `Metrics` struct, incremented in `TreeIter::create_range` each time `maybe_contains_prefix` returns `Ok(false)` and a segment is skipped. This allows integration tests to assert that the bloom-skip code path actually fired, catching regressions that silently disable bloom filtering while still returning correct results. - Add `prefix_bloom_skips: AtomicUsize` field to `Metrics` (feature-gated) - Add `pub fn prefix_bloom_skips()` accessor - Wire `Arc<Metrics>` through `IterState` for `Tree::create_prefix` path - Add two `#[cfg(feature = "metrics")]` integration tests: - `prefix_bloom_skip_metrics`: asserts counter > 0 after scanning 24 non-existent prefixes inside a wide key_range - `prefix_bloom_skip_metrics_zero_without_extractor`: asserts counter stays at 0 when no prefix extractor is configured Closes #59

Change `IterState::metrics` from `Arc<Metrics>` to `Option<Arc<Metrics>>` so non-prefix iterators skip the heap allocation. Only `create_prefix` passes `Some(metrics)` — all other range iterators use `None`. Rewrite probabilistic smoke-test rationale in prefix_bloom_skip_metrics.

Clarify that BlobTree prefix scans do not currently record this metric, since only Tree::create_prefix wires the metrics handle.

Clarify IterState.metrics doc to decouple None semantics from prefix usage. Add table_count assertion in prefix_bloom_skip_metrics test to fail early if flush layout changes break the single-table assumption.

Wrap the 24-prefix scan in a retry loop (up to 20 attempts) so the test tolerates rare all-false-positive bloom runs without flaking.

polaz · 2026-03-22T10:46:18Z

@coderabbitai re-review

coderabbitai · 2026-03-22T10:47:34Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat src/metrics.rs
echo "=== src/range.rs ==="
cat src/range.rs
echo "=== src/tree/mod.rs (relevant sections) ==="
rg -n "create_prefix|create_internal_range|IterState|metrics" src/tree/mod.rs -A 5 -B 2
echo "=== tests/tree_prefix_bloom.rs ==="
cat tests/tree_prefix_bloom.rs

Length of output: 70317

@polaz The code is unchanged from the previous pass — the implementation is clean and correct. One minor observation on this third read-through:

Single-table vs. multi-table run metric coverage

The prefix_bloom_skips counter is incremented only in the run.len() == 1 branch. The _ => branch (multi-table runs, added in PR #64) also filters individual tables using maybe_contains_prefix, but those per-table rejections are not counted. The doc comment says "segments skipped during Tree::create_prefix scans by prefix bloom filters" — which could reasonably be read as covering multi-table run skips too.

This isn't a correctness bug (the test deliberately enforces table_count() == 1 to exercise the counted path), but if the metric is later used for capacity-planning or performance diagnostics, a user might find it unexpectedly low on L0-heavy workloads. Worth a follow-up issue if not already tracked.

Everything else looks good:

AtomicUsize + Relaxed ordering is consistent with all other metric accessors.
The Option<Arc<Metrics>> approach (set only in create_prefix, None everywhere else) correctly avoids unnecessary clones on non-prefix iterators.
Both tests are well-designed: prefix_bloom_skip_metrics is a robust probabilistic test (retry loop + deterministic setup), and prefix_bloom_skip_metrics_zero_without_extractor is fully deterministic.

[approve]

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:09.063Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:29:48.199Z
Learning: In `structured-world/lsm-tree` (`src/prefix.rs`), `is_valid_scan_boundary(&self, prefix: &[u8]) -> bool` has been added to the `PrefixExtractor` trait in PR `#68` (closes issue `#47`), replacing the previously internal `is_valid_prefix_boundary` name. The method has a default implementation that checks `prefixes(prefix).any(|p| p == prefix)`, correct for sub-slice extractors, and can be overridden for O(1) checks (e.g., `FixedLengthPrefix`) or for extractors whose `prefixes()` does not return sub-slices. `compute_prefix_hash` in `src/prefix.rs` calls `is_valid_scan_boundary` as the sole call site. Do not flag the absence of `is_valid_scan_boundary` or suggest adding it in future PRs — it is already part of the public API.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Single-table run path is the only instrumented path; multi-table runs and BlobTree prefix scans are not yet counted.

Extend prefix_bloom_skips counter to the multi-table run filter in TreeIter::create_range, not just the single-table path. Update docs to reflect both paths are now instrumented (BlobTree excluded).

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 22, 2026 08:21

Copilot started reviewing on behalf of polaz March 22, 2026 08:24 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/tree/mod.rs

Comment thread tests/tree_prefix_bloom.rs Outdated

polaz requested a review from Copilot March 22, 2026 08:48

Copilot started reviewing on behalf of polaz March 22, 2026 08:51 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/metrics.rs Outdated

polaz requested a review from Copilot March 22, 2026 09:04

Copilot started reviewing on behalf of polaz March 22, 2026 09:07 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread tests/tree_prefix_bloom.rs

Comment thread src/range.rs Outdated

polaz force-pushed the feat/#59-feat-expose-prefix-bloom-skip-metrics-for-testabil branch from 6e2c01e to d4d541b Compare March 22, 2026 09:38

polaz requested a review from Copilot March 22, 2026 09:49

Copilot started reviewing on behalf of polaz March 22, 2026 09:50 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread tests/tree_prefix_bloom.rs Outdated

polaz force-pushed the feat/#59-feat-expose-prefix-bloom-skip-metrics-for-testabil branch from d4d541b to d9bf994 Compare March 22, 2026 10:12

polaz added 5 commits March 22, 2026 12:25

docs(metrics): scope prefix_bloom_skips to Tree::create_prefix

c6cb5bc

Clarify that BlobTree prefix scans do not currently record this metric, since only Tree::create_prefix wires the metrics handle.

refactor(metrics): improve IterState doc and add table_count guard

e4ee02f

Clarify IterState.metrics doc to decouple None semantics from prefix usage. Add table_count assertion in prefix_bloom_skip_metrics test to fail early if flush layout changes break the single-table assumption.

test(metrics): retry scan loop in prefix_bloom_skip_metrics

d5127fe

Wrap the 24-prefix scan in a retry loop (up to 20 attempts) so the test tolerates rare all-false-positive bloom runs without flaking.

polaz force-pushed the feat/#59-feat-expose-prefix-bloom-skip-metrics-for-testabil branch from fa9f234 to d5127fe Compare March 22, 2026 10:33

polaz requested a review from Copilot March 22, 2026 10:46

Copilot started reviewing on behalf of polaz March 22, 2026 10:46 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/metrics.rs Outdated

Comment thread tests/tree_prefix_bloom.rs

Comment thread src/range.rs

polaz added 2 commits March 22, 2026 13:08

docs(metrics): clarify prefix_bloom_skips is a lower bound

0c8f717

Single-table run path is the only instrumented path; multi-table runs and BlobTree prefix scans are not yet counted.

feat(metrics): count prefix bloom skips in multi-table run path

46cd2d0

Extend prefix_bloom_skips counter to the multi-table run filter in TreeIter::create_range, not just the single-table path. Update docs to reflect both paths are now instrumented (BlobTree excluded).

polaz requested a review from Copilot March 22, 2026 11:14

Copilot started reviewing on behalf of polaz March 22, 2026 11:15 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

polaz merged commit e5d40a0 into main Mar 22, 2026
18 checks passed

polaz deleted the feat/#59-feat-expose-prefix-bloom-skip-metrics-for-testabil branch March 22, 2026 11:22

sw-release-bot Bot mentioned this pull request Mar 22, 2026

chore: release v5.0.0 #60

Closed

Conversation

polaz commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

coderabbitai Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

src/metrics.rs

src/range.rs

src/tree/mod.rs

tests/tree_prefix_bloom.rs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polaz commented Mar 22, 2026 •

edited

Loading

coderabbitai Bot commented Mar 22, 2026 •

edited

Loading

codecov Bot commented Mar 22, 2026 •

edited

Loading

`src/metrics.rs`

`src/range.rs`

`src/tree/mod.rs`

`tests/tree_prefix_bloom.rs`