Skip to content

fix(test): use shared seqno counter in proptest oracle#97

Merged
polaz merged 10 commits into
mainfrom
feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r
Mar 23, 2026
Merged

fix(test): use shared seqno counter in proptest oracle#97
polaz merged 10 commits into
mainfrom
feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 22, 2026

Summary

  • Fix proptest oracle to use shared SequenceNumberCounter per API contract (was using independent counter)
  • Add regression test for stale point-read after compact cycles (derived from proptest seed)
  • Fix clippy never_loop lint in oracle's get() method

Technical Details

The proptest used an independent seqno counter (let mut seqno = 1) that did not advance on flush/compact, violating the API contract requiring data seqnos from the shared SequenceNumberCounter passed to Config::new. With independent counters, internal SuperVersion seqnos advance faster than data seqnos, causing get_version_for_snapshot to return a stale SuperVersion whose memtable misses recent inserts.

Root cause: get_version_for_snapshot(S) finds the latest SV with seqno < S. When the internal counter (advanced by flush/compact) outpaces user data seqnos, the returned SV references an old memtable that was rotated away.

Fix: use seqno_counter.next() from the shared counter for all data operations in the proptest, keeping SV seqnos and data seqnos properly interleaved.

Note: The bloom skipping feature (src/ changes) was merged via PR #64. This PR now contains only test improvements.

Test Plan

  • Regression test point_read_after_compact_flush_returns_latest_value passes
  • Proptest prop_btreemap_oracle_correctness passes (256 cases)
  • All 468+ library and integration tests pass
  • cargo clippy --tests clean

Closes #58

Copilot AI review requested due to automatic review settings March 22, 2026 17:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors test internals: replaces manual iteration in Oracle::get with an iterator pipeline and revises run_oracle_test to use shared SequenceNumberCounter instances for seqno assignment and visibility tracking. Adds a regression test that verifies point-read correctness after compaction+flush sequences.

Changes

Cohort / File(s) Summary
Oracle & test harness
tests/prop_btreemap_oracle.rs
Rewrote Oracle::get to use range() + take_while + map + next().flatten() instead of an explicit loop; replaced local seqno logic in run_oracle_test with two SequenceNumberCounters (seqno_counter, visible_seqno) and updated how seqnos are assigned to Insert/Remove/Compact and how visibility watermark is updated/read.
Regression test
tests/regression_point_read_seqno.rs
Added with_post_compact_flush() test that builds a tree with shared SequenceNumberCounters, performs compaction/flush/insert sequences, and asserts a point read returns the expected value at the visible seqno.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 A shared counter ticks for tree and test,
Seqnos march onward, leaving doubts to rest.
Flushes and compacts line up in a row,
Point reads find the value they should know,
Hopping on, the rabbit cheers — "Bravo!" 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements the main coding objectives from issue #58: extending prefix bloom skipping to multi-table runs via RunReader by passing prefix_hash/bloom hints and performing per-table checks lazily during iteration.
Out of Scope Changes check ✅ Passed Changes are in-scope: test oracle fixes (seqno handling) and a regression test for point-read correctness are supporting changes; core feature changes (RunReader, src/range.rs, BloomHints) directly support issue #58 objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(test): use shared seqno counter in proptest oracle' accurately describes the main changes in the pull request, which involve fixing the proptest oracle test to use shared SequenceNumberCounter for proper seqno interleaving between data operations and SuperVersion advances.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves prefix/key bloom filter checks for multi-table runs from eager upfront filtering in TreeIter::create_range to lazy, per-table evaluation inside RunReader, improving first-result latency for partial scans over large runs.

Changes:

  • Add BloomHints and a with_bloom_hints() builder on RunReader to pass prefix/key bloom hashes (and optional metrics) into the run iterator.
  • Implement lazy intermediate-table bloom skipping in RunReader::next() / next_back() via a new bloom_skip() helper.
  • Update TreeIter::create_range to stop pre-filtering multi-table runs and instead attach bloom hints to each RunReader.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/run_reader.rs Introduces BloomHints, attaches hints to RunReader, and performs lazy per-table bloom skipping when advancing across intermediate tables.
src/range.rs Removes eager multi-table run bloom pre-filtering and passes bloom hints into RunReader instances.
src/metrics.rs Updates documentation for prefix_bloom_skips to reflect the new counting location for multi-table runs.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@polaz polaz force-pushed the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch from ba66c88 to d07d367 Compare March 22, 2026 18:14
@polaz polaz requested a review from Copilot March 22, 2026 18:16
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 22, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

@polaz Sure, I'll re-review the PR now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:10.140Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:14.362Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.968Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.708Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.
✅ Actions performed

Full review triggered.

@polaz polaz force-pushed the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch from d07d367 to c8272e8 Compare March 22, 2026 18:24
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 42e6ebc Previous: ff82717 Ratio
fillrandom 1066814.4048151223 ops/sec 1277896.8965820211 ops/sec 1.20

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
Comment thread src/range.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/run_reader.rs (2)

126-141: Consider pruning the culled boundary tables too.

bloom_skip() only runs in the intermediate-table loops. When RunReader::new() collapses a multi-table run down to one or two overlapping tables, neither boundary is checked, so narrow scans still miss the bloom-pruning fast path entirely. A small precheck when attaching hints, or before constructing RunReader, would recover that optimization for hi - lo <= 1.

Also applies to: 158-173, 198-213

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/run_reader.rs` around lines 126 - 141, The bloom pruning misses single-
or two-table runs because bloom_skip is only used for intermediate tables;
update the path where hints are attached or where RunReader::new collapses runs
(referencing with_bloom_hints, RunReader::new, bloom, bloom_skip, and
BloomHints::should_skip) to perform a quick precheck of the boundary tables when
hi - lo <= 1 and prune any boundary table that BloomHints::should_skip reports
as absent; i.e., before finalizing the run collapse, call
bloom.should_skip(&table) for the lo and hi boundary Table(s) and adjust/remove
them so narrow scans benefit from the same bloom-pruning fast path.

451-477: Make the bloom-miss test input deterministic.

These assertions depend on the current bloom configuration rejecting one hard-coded absent hash. Since the filter can legitimately false-positive, the test is coupled to today's hash/bits-per-key choices. Consider probing candidate hashes during setup until the intermediate tables return Ok(false), then using that value for the skip assertions.

Also applies to: 486-504

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/run_reader.rs` around lines 451 - 477, The test currently uses a single
hard-coded hash from standard_bloom::Builder::get_hash(b"zzz_not_in_any_table")
which can false-positive; instead probe candidate hashes during setup by
generating hashes (e.g., with Builder::get_hash on different byte inputs) and
checking against the intermediate tables' bloom filters until you find one that
definitively returns Ok(false) for those tables, then use that hash in
BloomHints passed to RunReader::with_bloom_hints so the intermediate tables are
reliably bloom-skipped and the assertions remain deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/run_reader.rs`:
- Around line 126-141: The bloom pruning misses single- or two-table runs
because bloom_skip is only used for intermediate tables; update the path where
hints are attached or where RunReader::new collapses runs (referencing
with_bloom_hints, RunReader::new, bloom, bloom_skip, and
BloomHints::should_skip) to perform a quick precheck of the boundary tables when
hi - lo <= 1 and prune any boundary table that BloomHints::should_skip reports
as absent; i.e., before finalizing the run collapse, call
bloom.should_skip(&table) for the lo and hi boundary Table(s) and adjust/remove
them so narrow scans benefit from the same bloom-pruning fast path.
- Around line 451-477: The test currently uses a single hard-coded hash from
standard_bloom::Builder::get_hash(b"zzz_not_in_any_table") which can
false-positive; instead probe candidate hashes during setup by generating hashes
(e.g., with Builder::get_hash on different byte inputs) and checking against the
intermediate tables' bloom filters until you find one that definitively returns
Ok(false) for those tables, then use that hash in BloomHints passed to
RunReader::with_bloom_hints so the intermediate tables are reliably
bloom-skipped and the assertions remain deterministic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1aa8bb34-e76b-44c3-9467-5cacfb17c5de

📥 Commits

Reviewing files that changed from the base of the PR and between b29c5b6 and c8272e8.

📒 Files selected for processing (3)
  • src/metrics.rs
  • src/range.rs
  • src/run_reader.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
@polaz polaz force-pushed the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch from 60ce143 to fcfb7de Compare March 22, 2026 18:54
@polaz polaz requested a review from Copilot March 22, 2026 19:01
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 22, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main changed files
cat src/run_reader.rs
echo "=== RANGE.RS ==="
cat src/range.rs
echo "=== METRICS.RS ==="
cat src/metrics.rs

Length of output: 50643


@polaz The implementation looks solid. Here's my full re-review:


src/run_reader.rs

BloomHints::should_skip — Correctly returns false on both filter I/O errors and "maybe" results (conservative fallback). The metric increment for prefix skips on Ok(false) is correct; key-hash skips intentionally don't increment prefix_bloom_skips given the metric's semantics.

with_bloom_hints — Boundary readers are eagerly nulled when bloom-rejected. One minor observation: self.lo and self.hi are not adjusted when the boundary readers are nulled. As a consequence, Phase 2 in next()/next_back() will re-check the same bloom-rejected boundary table as its first "intermediate" candidate. Because bloom filters are deterministic this is always a fast skip (same hash → same Ok(false)) and correctness is unaffected, but it costs one redundant filter read per rejected boundary per scan. You could eliminate it by incrementing self.lo / decrementing self.hi right after nulling the respective reader in with_bloom_hints. Worth a follow-up but not blocking.

Iterator::next / DoubleEndedIterator::next_back — The three-phase loop (drain current reader → bloom-scan intermediates → drain opposite boundary) is correct for all cases: single-table (lo == hi), all-rejected (both boundaries null + all intermediates skipped), and mixed bidirectional use.

The if self.hi == 0 { break; } guard inside Phase 2 of next_back is technically unreachable (the while condition self.lo < self.hi with usize already excludes hi == 0), but it's harmless as a defence against future underflow.

Intermediate tables use table.iter() (full scan) rather than table.range() — correct for a disjoint run where intermediate tables are guaranteed to lie entirely within the query bounds.

src/range.rs

The removal of bloom_passes and the switch to bloom_hints.should_skip(table) in the single-table arm is clean. Multi-table runs are now pushed unfiltered and rely on RunReader::with_bloom_hints, which is exactly the intent. The trade-off comment (losing single-table RT-skip demotion for lazy evaluation) is well-documented with a clear rationale.

Range tombstones continue to be collected from all tables in the run regardless of bloom — correct, as they may affect keys in other tables/levels.

src/metrics.rs

Doc comment accurately reflects the new two-path counting (single-table path in create_range + lazy inside RunReader). No logic changes.

Tests

New unit tests cover forward skip, reverse skip, no-hints passthrough, and field-setting. All correctness-oriented as expected — internal skip counters are pub(crate) and not observable from integration tests.


Overall: The design is sound, correctness is maintained, and the lazy-evaluation trade-off is well-reasoned and documented. The redundant boundary bloom check is the only nit worth noting. LGTM otherwise.

[approve]


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:14.362Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:10.140Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.968Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
@polaz polaz force-pushed the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch from 53616b0 to 0468c35 Compare March 22, 2026 19:45
@polaz polaz requested a review from Copilot March 22, 2026 19:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/run_reader.rs Outdated
Comment thread src/run_reader.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread tests/regression_point_read_seqno.rs Outdated
Comment thread tests/prop_btreemap_oracle.rs Outdated
polaz added 8 commits March 23, 2026 03:35
Pass prefix_hash and key_hash into RunReader via BloomHints so it can
skip tables lazily during iteration instead of upfront pre-filtering.

- Add BloomHints struct with should_skip() method shared by RunReader
  and the single-table path in range.rs
- Boundary tables are bloom-checked during lazy init (ensure_lo/hi);
  rejected boundaries skip reader creation entirely
- Intermediate tables are bloom-checked lazily in next()/next_back()
  when the iterator advances to them
- Remove upfront multi-table filtering loop from range.rs create_range;
  bloom hints are passed to RunReader instead
- 9 unit tests covering forward/reverse/partial/key_hash/3-table paths

Closes #58
- Restructure next()/next_back() into 3-phase loops: drain current
  reader → advance through bloom-skipped intermediates → fall through
  to opposite boundary. Fixes skipped intermediates when a boundary
  table is bloom-rejected.
- Advance lo/hi index in ensure_lo/hi_initialized when boundary is
  rejected to avoid redundant re-check and double metric counts
- Remove dead hi == 0 guard (while lo < hi guarantees hi >= 1)
- Add metrics test verifying prefix_bloom_skips counter
Pre-existing bug: proptest found a sequence of insert-flush-compact
operations where point read returns an older value instead of the
latest insert. Reproduces on main without bloom changes.

Seed: cc 90710f9669603bd6cb741490355db41b936dd1088e87c45b621c6ec6eeef1ad2
Narrowed down: the bug requires an unflushed memtable entry present
during major_compact. The exact_replay test reproduces the minimal
proptest failing input verbatim.
Bug requires: initial no-op compact on empty tree + insert with
unflushed memtable during triple compact + second compact cycle +
post-compact flush creating L0 table + final memtable inserts.
The proptest and regression test used an independent seqno counter
(`let mut seqno = 1`) that did not advance on flush/compact, violating
the API contract requiring data seqnos from the shared
SequenceNumberCounter passed to Config::new. With independent counters,
internal SuperVersion seqnos advance faster than data seqnos, causing
get_version_for_snapshot to return a stale SuperVersion whose memtable
misses recent inserts.

Root cause: get_version_for_snapshot(S) finds the latest SV with
seqno < S. When the internal counter (advanced by flush/compact)
outpaces user data seqnos, the returned SV references an old memtable
that was rotated away, missing data inserted into the current memtable.

Fix: use seqno_counter.next() from the shared counter for all data
operations, keeping SV seqnos and data seqnos properly interleaved.
Clippy flagged the for-loop-that-always-breaks pattern as never_loop.
Rewrite to take_while + next for the same "first matching element"
semantics.
Use visible_seqno.get() instead of seqno_counter.get() for the read
seqno. The visible_seqno tracks the visibility watermark and won't
drift ahead of what the tree considers readable.
@polaz polaz force-pushed the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch from 9bd04ba to 2beca07 Compare March 23, 2026 01:48
@polaz polaz requested a review from Copilot March 23, 2026 08:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread tests/regression_point_read_seqno.rs Outdated
Rename with_post_compact_flush → point_read_after_compact_flush_returns_latest_value
to follow the <what>_<condition>_<expected> pattern.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread tests/prop_btreemap_oracle.rs Outdated
Comment thread tests/regression_point_read_seqno.rs
No-op flush (empty memtable) and DoNothing compaction skip
upgrade_version entirely, so the comment should say "may" not "also".
@polaz polaz changed the title feat: lazy per-table bloom skipping inside RunReader fix(test): use shared seqno counter in proptest oracle Mar 23, 2026
@polaz polaz requested a review from Copilot March 23, 2026 08:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 1ee0db2 into main Mar 23, 2026
19 of 20 checks passed
@polaz polaz deleted the feat/#58-feat-extend-prefix-bloom-skipping-to-multi-table-r branch March 23, 2026 08:38
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: extend prefix bloom skipping to multi-table runs by passing prefix_hash to RunReader

2 participants