Skip to content

perf: partition-aware bloom filtering for point-read pipeline#102

Merged
polaz merged 15 commits into
mainfrom
feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part
Mar 23, 2026
Merged

perf: partition-aware bloom filtering for point-read pipeline#102
polaz merged 15 commits into
mainfrom
feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 22, 2026

Summary

  • Add Table::bloom_may_contain_key(key, key_hash) — seeks the partitioned filter TLI by user key and checks only the matching partition's bloom filter, replacing the conservative Ok(true) fallback
  • Add bloom_key field to IterState, populated by resolve_merge_via_pipeline for single-key point-read pipelines
  • bloom_passes() dispatches to the key-aware method when bloom_key is available, falls back to hash-only path otherwise
  • debug_assert ensures bloom_key is never set without key_hash

Technical Details

Previously, bloom_may_contain_key_hash returned Ok(true) for partitioned/TLI filter configurations because the partition index is keyed by user key boundaries, not by raw hash — checking by hash alone would require scanning all partitions. The new bloom_may_contain_key method accepts the actual user key, seeks the TLI to the correct partition in O(log P), and queries only that partition's bloom filter. Keys beyond all partition boundaries return Ok(false) (definite miss).

The existing bloom_may_contain_key_hash (hash-only) path is preserved unchanged for callers that don't have the key available (e.g. prefix scans).

pinned_filter_block and pinned_filter_index are mutually exclusive (set at construction time), so the branch order in bloom_may_contain_key is safe.

Slice::from(key) in the merge pipeline copies the key once per resolution (not zero-copy), but the cost is negligible compared to I/O savings.

Known Limitations

  • Only resolve_merge_via_pipeline sets bloom_key — general range scans still use hash-only bloom pre-filtering (which is correct but less effective for partitioned filters)
  • Unpinned filter TLI path falls through to hash-only (consistent with existing unimplemented! for unpinned TLI in Table::get)

Test Plan

  • partitioned_bloom_skip_for_point_reads — verifies bloom filter is queried for non-matching key with partitioned filters (metrics: filter_queries >= 1)
  • partitioned_bloom_skip_beyond_partitions — verifies key beyond all partition boundaries is correctly rejected
  • partitioned_bloom_skip_merge_pipeline — exercises bloom_may_contain_key through the merge pipeline with bracketing distractor keys
  • full_filter_bloom_skip_merge_pipeline — covers the full-filter delegation path through the merge pipeline
  • bloom_may_contain_key_full_filter — unit test: both methods agree for full filters
  • bloom_may_contain_key_partitioned_filter — unit test: contrast assertion proving key-based rejects while hash-only returns conservative Ok(true)
  • All existing tests pass unchanged

Closes #83

Summary by CodeRabbit

  • Performance Improvements

    • Partition-aware bloom checks reduce unnecessary reads by skipping keys outside targeted partitions.
  • New Features

    • Key-aware bloom query path added; iterators now include the bloom key when available to enable more precise partitioned filtering while preserving conservative behavior when partition info is absent.
  • Tests

    • Added unit and integration tests validating full and partitioned bloom behavior across point reads and merge-pipeline scenarios.

Copilot AI review requested due to automatic review settings March 22, 2026 18:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d38eabf-72ef-4ab9-bd0d-e197192f09ef

📥 Commits

Reviewing files that changed from the base of the PR and between c622511 and 8634930.

📒 Files selected for processing (5)
  • src/range.rs
  • src/table/mod.rs
  • src/table/tests.rs
  • src/tree/mod.rs
  • tests/partitioned_bloom_skip.rs
✅ Files skipped from review due to trivial changes (2)
  • tests/partitioned_bloom_skip.rs
  • src/table/tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/tree/mod.rs
  • src/range.rs

📝 Walkthrough

Walkthrough

Adds an optional bloom_key to iterator state and a new Table::bloom_may_contain_key(key, key_hash) API. Iterators and merge paths are adjusted to set/clear bloom_key, enabling partition-aware bloom checks that seek the partition index and probe only the matching partition's filter when applicable.

Changes

Cohort / File(s) Summary
Iterator state & call sites
src/range.rs, src/tree/mod.rs
Added IterState::bloom_key: Option<UserKey> and threaded it: set for point-read/merge pipeline, cleared for prefix/range iterators. Added debug assertion ensuring bloom_key implies key_hash.
Table partition-aware bloom
src/table/mod.rs
Added pub(crate) fn bloom_may_contain_key(&self, key: &[u8], key_hash: u64) -> crate::Result<bool>; seeks partition index with key (using seqno::MAX_SEQNO), materializes the matching partition's filter block when pinned, and calls per-partition maybe_contains_hash. Falls back to hash-only API or conservative Ok(true) where appropriate.
Tests (unit & integration)
src/table/tests.rs, tests/partitioned_bloom_skip.rs
Added unit tests comparing key-based vs hash-only bloom behavior for full and partitioned filters. Added integration tests exercising partitioned/full bloom skip behavior for point reads and merge pipeline using a test-local SumMerge operator.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Iterator/Caller
  participant Table as Table
  participant Index as PartitionIndex
  participant Disk as Disk/IO
  participant Filter as FilterBlock

  Caller->>Table: bloom_may_contain_key(key, key_hash)
  Table->>Table: debug_assert hash == Builder::get_hash(key)
  alt pinned_filter_block present
    Table->>Filter: bloom_may_contain_hash(key_hash)
    Filter-->>Table: bool / Err
    Table-->>Caller: Ok(bool) / Err
  else pinned_filter_index present
    Table->>Index: seek(key, seqno::MAX_SEQNO)
    Index-->>Table: ceiling partition entry / None
    alt ceiling partition found
      Table->>Disk: load partition's filter block bytes
      Disk-->>Filter: filter bytes (uncompressed)
      Table->>Filter: maybe_contains_hash(key_hash)
      Filter-->>Table: bool / Err
      Table-->>Caller: Ok(bool) / Err
    else no ceiling partition
      Table-->>Caller: Ok(false)
    end
  else fallback
    Table->>Table: bloom_may_contain_hash(key_hash)
    Table-->>Caller: Ok(bool) / Err
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I sniffed each partition, hopped where blooms might be,
If none matched my whiskers, I skipped a disk degree.
A careful little rabbit, I saved a costly peep,
I counted hops and carrots, then burrowed back to sleep. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding partition-aware bloom filtering for point-read pipeline, which matches the core objective of extending bloom filter API for partition-aware seeking.
Linked Issues check ✅ Passed The pull request fully implements issue #83 objectives: adds bloom_may_contain_key API supporting partition-aware seeking, updates resolve_merge_via_pipeline and bloom_passes to use it, preserves conservative fallback for unsupported configs, and includes comprehensive tests validating the new behavior.
Out of Scope Changes check ✅ Passed All changes align with issue #83 scope: partition-aware bloom filtering API, IterState modifications for point reads, and related integration tests. No unrelated refactoring or feature creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 0e9a2a3 Previous: befb450 Ratio
mergerandom 564209.4550305106 ops/sec 718056.7379855699 ops/sec 1.27

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves point-read merge-resolution performance for partitioned bloom filters by adding a key-aware bloom check that can seek the filter partition index, and plumbing the user key through the point-read iterator pipeline so only the relevant bloom partition is queried.

Changes:

  • Add Table::bloom_may_contain_key(key, key_hash) to support partition-aware bloom checks via partition-index seeking.
  • Extend IterState with bloom_key and dispatch bloom_passes() to the key-aware bloom path when available.
  • Add integration tests intended to validate partitioned bloom skipping behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/table/mod.rs Adds key-aware, partition-seeking bloom check for partitioned filters.
src/range.rs Wires optional bloom_key into the bloom pre-filter decision logic.
src/tree/mod.rs Populates IterState.bloom_key for single-key merge-resolution pipeline.
tests/partitioned_bloom_skip.rs Adds tests for partitioned bloom skip behavior.

Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread src/table/mod.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs Outdated
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from 40affb7 to 17914cd Compare March 22, 2026 19:48
@polaz polaz requested a review from Copilot March 22, 2026 19:50
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/range.rs 83.33% 1 Missing ⚠️
src/table/mod.rs 95.65% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread src/tree/mod.rs Outdated
Comment thread src/table/mod.rs
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from ca1bb85 to bf28b64 Compare March 22, 2026 20:19
@polaz polaz requested a review from Copilot March 22, 2026 20:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs Outdated
Comment thread tests/partitioned_bloom_skip.rs
Comment thread tests/partitioned_bloom_skip.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread tests/partitioned_bloom_skip.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread src/table/tests.rs
Comment thread src/table/tests.rs
Comment thread tests/partitioned_bloom_skip.rs Outdated
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from 0316119 to 8fdf161 Compare March 22, 2026 21:47
@polaz polaz requested a review from Copilot March 22, 2026 22:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/range.rs
Comment thread src/table/mod.rs
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from 7de32e8 to 07c814b Compare March 22, 2026 22:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/partitioned_bloom_skip.rs`:
- Around line 137-148: The test currently uses a distractor SST key
("zzz_other") that is pruned by table.check_key_range_overlap before
bloom_may_contain_key runs; change the distractor so it passes the range-check
but fails the bloom check (for example insert a key that shares the same
key-range as "counter" but is not the bloom_key—e.g. "counter_other" or another
lexicographically adjacent key that falls within the table's min/max), or
alternatively update the test/comment to state it only verifies merge
correctness (rename test). Update the insert call in the failing test (the
tree.insert that creates the distractor SST) and its comment, and ensure the new
key triggers resolve_merge_via_pipeline while allowing TreeIter::create_range ->
bloom_may_contain_key to execute; references: TreeIter::create_range,
table.check_key_range_overlap, bloom_may_contain_key, and the test's
merge/insert sequence (merge("counter", ...), insert of distractor).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 207b4aa9-7db2-49ed-b0ce-0aa43cf1a184

📥 Commits

Reviewing files that changed from the base of the PR and between 2c7d5dd and 7de32e8.

📒 Files selected for processing (5)
  • src/range.rs
  • src/table/mod.rs
  • src/table/tests.rs
  • src/tree/mod.rs
  • tests/partitioned_bloom_skip.rs

Comment thread tests/partitioned_bloom_skip.rs Outdated
@polaz polaz requested a review from Copilot March 22, 2026 22:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/tree/mod.rs
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz: Triggering another full review of PR #102 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T00:44:56.025Z
Learning: In `structured-world/lsm-tree`, issue `#83` has been resolved in PR `#102` (`feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part`). `Table::bloom_may_contain_key(key, key_hash)` was added to `src/table/mod.rs`, which seeks the partitioned filter TLI by user key and queries only the matching partition's bloom filter, returning `Ok(false)` for keys beyond all partition boundaries. `IterState` in `src/range.rs` gained `bloom_key: Option<UserKey>`; `bloom_passes()` dispatches to `bloom_may_contain_key` when `bloom_key` is set (single-key pipeline path) and falls back to `bloom_may_contain_key_hash` otherwise (prefix/range scans). The prior conservative `Ok(true)` fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters is preserved for callers without a key. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in any future PR — it is intentional for callers that lack a user key.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:10.140Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.968Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:10.672Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 108
File: src/compaction/leveled/test.rs:681-727
Timestamp: 2026-03-23T00:40:16.100Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/test.rs`), the regression test `multi_level_sparse_keyspace_data_integrity` (added in PR `#108` for issue `#72`) uses full-span SSTs (writes both "a" and "z" in each flush) rather than genuinely disjoint key ranges. This is a known, accepted coverage gap: at unit-test data volumes the force-trivial-move scoring (99.99) cascades all intermediate levels to Lmax, making it structurally impossible to simultaneously populate L1 and L2 in a unit test. The test verifies data integrity (read-back correctness) only, not the per-range L2 sparse-overlap optimization. Do not flag the full-span flush pattern in this test as a deficiency in PR `#108` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread tests/partitioned_bloom_skip.rs Outdated
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from a76b0cf to 0e9a2a3 Compare March 23, 2026 11:55
@polaz polaz requested a review from Copilot March 23, 2026 12:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/table/tests.rs Outdated
polaz added 15 commits March 23, 2026 14:38
Add bloom_may_contain_key() which seeks the partitioned filter TLI by
user key to check only the matching partition's bloom — replacing the
conservative Ok(true) fallback that bloom_may_contain_key_hash() uses
for partitioned/TLI filter configurations.

- Add bloom_key field to IterState, set by resolve_merge_via_pipeline
- bloom_passes() dispatches to key-aware method when bloom_key present
- Keys beyond all partition boundaries return Ok(false) (definite miss)
- Full (non-partitioned) filters delegate to existing hash-only path

Closes #83
SeqNo::MAX (u64::MAX) violates the reserved MSB range invariant.
Use crate::seqno::MAX_SEQNO (0x7FFF_FFFF_FFFF_FFFF) which is the
safe upper bound for "latest" seeks.

Also add merge pipeline test that exercises the bloom_may_contain_key
path through resolve_merge_via_pipeline → TreeIter → bloom_passes,
and clarify existing test docs to distinguish Table::get vs pipeline
bloom paths.
- Eliminate redundant key_slice clone by deriving range from bloom_key
- Replace unwrap_or_default() with .expect() in test merge operator
  for explicit failure on malformed data
- Relax exact filter_queries assertion to >= 1 (counter may vary
  with internal filter probes)
- Add clarifying comment for seek+next None → Ok(false) correctness
- Add full_filter_bloom_skip_merge_pipeline test covering the
  bloom_may_contain_key delegation path for non-partitioned filters
- Correct test docstrings: Table::get path tests don't claim
  bloom_may_contain_key; merge pipeline tests explain the code path
- Document why pipeline tests assert correctness not metrics
  (io_skipped_by_filter is only incremented by Table::get, not
  by bloom_passes in the pipeline path)
Cover full-filter delegation and partitioned-filter seek+reject
paths directly at the Table level, including Ok(false) for keys
beyond all partition boundaries.
Key beyond table key range is rejected by key-range overlap check
before bloom. The bloom_may_contain_key Ok(false) path for keys
beyond partitions is covered by unit test in table::tests.
- Assert bloom_may_contain_key_hash returns Ok(true) conservatively
  for partitioned filters while bloom_may_contain_key returns Ok(false)
  — demonstrates the core behavioral improvement
- Assert both methods agree for full (non-partitioned) filters
- Extract SumMerge to module level to avoid duplication
- Add comment explaining UserKey (Slice) Deref<Target=[u8]> coercion
  in bloom_passes to prevent false "won't compile" review flags
- Document key_hash pre-computation contract in bloom_may_contain_key
  doc comment (same pattern as Table::get)
Replace distractor key "zzz_other" with bracketing keys "aaa"+"zzz"
that make key_range_overlap pass, ensuring bloom_may_contain_key is
the actual filter step in the merge pipeline tests.
Slice::from(&[u8]) copies the key (not zero-copy). Document that
this runs once per merge resolution and is negligible compared to
I/O savings from partition-aware bloom filtering.
…ssertion

- Add debug_assert that bloom_key requires key_hash to be set
- Document that pinned_filter_block and pinned_filter_index are
  mutually exclusive (set at construction time)
- Relax exact io_skipped_by_filter assertion to filter_queries >= 1
  since bloom filters are probabilistic (FPR ~0.8% at 10 bpk)
…ition assertion

- Add debug_assert_eq verifying key_hash matches BloomBuilder::get_hash(key)
  in bloom_may_contain_key to catch misuse in debug builds
- Gate beyond-partitions Ok(false) assertion on pinned_filter_index.is_some()
  since test_with_table runs multiple table configurations (pinned/unpinned)
filter_queries counts lookups (not just rejects), so docstring
should say "filter lookup occurred" not "filter rejects the table".
pinned_filter_index is loaded unconditionally in Table::recover when
filter_tli exists, so the else branch (hash-only fallback) was dead
code in the partitioned filter test.
@polaz polaz force-pushed the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch from 0e9a2a3 to 8634930 Compare March 23, 2026 12:39
@polaz polaz requested a review from Copilot March 23, 2026 12:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 61cf608 into main Mar 23, 2026
20 checks passed
@polaz polaz deleted the feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part branch March 23, 2026 13:09
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: extend bloom_may_contain_key_hash to support partitioned/TLI filter seeking for point reads

2 participants