refactor: unify merge resolution via bloom-filtered iterator pipeline by polaz · Pull Request #69 · structured-world/coordinode-lsm-tree

polaz · 2026-03-22T08:20:03Z

Summary

Replace hand-rolled resolve_merge_get() with resolve_merge_via_pipeline() that reuses Merger → MvccStream on a key..=key range
Add standard bloom pre-filtering (Table::bloom_may_contain_key_hash()) to skip many disk tables for point reads
Eliminate duplicated operand collection / RT suppression / Indirection logic between point reads and range scans

Net -143 lines — merge resolution now lives in one place (MvccStream).

Changes

File	What
`table/mod.rs`	Extract `bloom_may_contain_hash()` base, add `bloom_may_contain_key_hash()`
`range.rs`	Add `key_hash` to `IterState`, `bloom_passes()` helper for unified prefix+key bloom
`tree/mod.rs`	`resolve_merge_via_pipeline()` replaces ~150-line `resolve_merge_get()`
`memtable/mod.rs`	Remove unused `get_all_for_key()` and its tests
`tests/merge_operator.rs`	Update comments referencing old function name

Test plan

cargo check — 0 warnings, 0 errors
cargo test — 757 passed, 0 failed
All 44 merge operator tests pass unchanged
Benchmark point-read latency on 100-table tree within 5% of baseline

Closes #46

Summary by CodeRabbit

Improvements
- Enhanced bloom-filter pre-filtering for single- and multi-table scans with optional key-hash checks and consolidated pass/fail logic; prefix-based skip metrics adjusted.
Refactor
- Merge resolution unified into a pipeline-based point-read path; obsolete per-key retrieval pathway removed.
Tests
- Added and updated tests validating prefix/bloom behavior and merge resolution with overlapping/non-matching tables.
Chores
- Added a benchmark for merge point-read performance.

coderabbitai · 2026-03-22T08:20:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Refactors point-read merge resolution to use a bloom-filtered single-key iterator pipeline with optional key-hash prefilter, adds IterState.key_hash, introduces table bloom APIs for raw key hashes, removes Memtable::get_all_for_key, and updates tests and benchmarks to exercise the new pipeline and bloom behavior.

Changes

Cohort / File(s)	Summary
Merge resolution pipeline `src/tree/mod.rs`	Removed `Tree::resolve_merge_get`; added `Tree::resolve_merge_via_pipeline` that builds an inclusive single-key range, sets `IterState.key_hash`, and resolves merges via the iterator/MvccStream pipeline; updated callsites to pass `Option<&Arc<dyn MergeOperator>>`.
Iterator & bloom gating `src/range.rs`	Added `IterState.key_hash: Option<u64>`; centralized per-table bloom logic into `bloom_passes(state, table)` and applied bloom pre-filtering for both prefix and key hashes across single- and multi-table scan paths; moved prefix-skip metric into `bloom_passes`.
Table bloom APIs `src/table/mod.rs`	Introduced private `bloom_may_contain_hash(...)`, reintroduced `maybe_contains_prefix(...)` as a thin wrapper, and added `pub(crate) bloom_may_contain_key_hash(key_hash: u64) -> Result<bool>` to support precomputed key-hash checks.
Memtable removal `src/memtable/mod.rs`	Removed `Memtable::get_all_for_key(...)` and its unit tests; memtable point-collection is now handled via the unified iterator pipeline.
Tests & benches `tests/merge_operator.rs`, `tests/tree_prefix_bloom.rs`, `benches/merge_point_read.rs`, `Cargo.toml`	Updated/added tests to validate merge resolution via `resolve_merge_via_pipeline` and prefix-bloom behavior; added Criterion benchmark `merge_point_read` exercising merge-point reads with/without cache.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Tree as Tree
    participant Pipeline as Iterator\n(IterState)
    participant Bloom as BloomFilter
    participant Table as Table/Memtable
    participant Stream as MvccStream\n(Merger)

    Client->>Tree: get(key, seqno)
    Tree->>Pipeline: build single-key range\nset IterState.key_hash = hash(key)
    Pipeline->>Bloom: bloom_may_contain_key_hash(hash)
    Bloom-->>Pipeline: pass / skip
    Pipeline->>Table: open iterators for active/sealed + bloom-passing tables
    Table-->>Stream: emit entries for key
    Stream->>Stream: resolve merges / apply RT & indirection rules
    Stream-->>Tree: resolved value / None / error
    Tree-->>Client: return result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

perf: lazy iterator pipeline initialization for point-read merge path #84 — touches the same move of point-read merge resolution into a bloom-filtered iterator pipeline and related range/table bloom changes.

Possibly related PRs

feat(metrics): expose prefix bloom skip counter for testability #70 — intersects with prefix-bloom skipping and metric relocation tied to bloom_passes.
feat: merge operators for commutative LSM operations #28 — introduced the original resolve_merge_get path that this PR replaces with the iterator-based pipeline.
feat: extend prefix bloom skipping to multi-table runs #64 — overlaps changes to per-table bloom pre-filtering and iterator creation logic.

Poem

🐰 I hop through filters, one-key at a time,
I set my hash and let the blooms chime,
Memtables step aside, SSTs take their cue,
Pipelines hum softly and stitch old with new,
A tiny rabbit cheers — merge resolved true!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: unifying merge resolution through a bloom-filtered iterator pipeline, which is the core refactor across multiple files.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#46` are met: resolve_merge_get removed, point reads use MvccStream pipeline with bloom pre-filtering, all 44 merge tests pass, and benchmarks for latency on deep trees are included.
Out of Scope Changes check	✅ Passed	All changes directly support the merge resolution refactor. Removed get_all_for_key (no longer needed), added bloom APIs, added benchmarks, and updated tests—all align with the issue objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#46-refactor-unify-merge-resolution-via-bloom-filtered

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-22T08:25:33Z

Codecov Report

❌ Patch coverage is 92.75362% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/range.rs	84.61%	4 Missing ⚠️
src/tree/mod.rs	97.05%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR refactors point-read merge resolution to reuse the existing Merger → MvccStream iterator pipeline (via a single-key key..=key range), while adding bloom-based pre-filtering so point reads can skip irrelevant SSTs without duplicating merge logic.

Changes:

Replace the hand-rolled point-read merge collector with resolve_merge_via_pipeline() backed by TreeIter/MvccStream.
Add a standard bloom key-check API on Table and thread a key_hash through iterator state to enable bloom skipping for single-key pipelines.
Remove the now-unused memtable helper (get_all_for_key) and its tests; update test comments to reference the new function.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/table/mod.rs`	Factors bloom checking into a hash-based helper and exposes `bloom_may_contain_key()` for point-read pre-filtering.
`src/range.rs`	Extends `IterState` with `key_hash` and centralizes bloom filtering via `bloom_passes()` for both prefix and key-based skips.
`src/tree/mod.rs`	Switches point-read merge resolution to the iterator pipeline (`resolve_merge_via_pipeline`) and updates merge-operator plumbing accordingly.
`src/memtable/mod.rs`	Removes `get_all_for_key()` and its unit tests, which were only used by the old point-read merge path.
`tests/merge_operator.rs`	Updates comments referring to the old merge-resolution function name.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

polaz · 2026-03-22T09:53:33Z

@coderabbitai re-review

coderabbitai · 2026-03-22T09:53:42Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/blob_tree/mod.rs:648-655
Timestamp: 2026-03-21T09:32:08.642Z
Learning: In `structured-world/lsm-tree`, `BlobTree::merge` (in `src/blob_tree/mod.rs`) delegates directly to `self.index.merge(...)`. BlobTree's `get()`/`resolve_key()` does not yet resolve merge operands — it handles Indirection but not merge operand collection. Full BlobTree merge read-path support is a documented known limitation deferred to follow-up work (introduced in PR `#28`). The compaction path already handles merge operands correctly, including Indirection safety via the pending buffer. Do not flag the absence of BlobTree merge read-path resolution as a blocking concern in PR `#28` or future PRs until the follow-up work is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 45
File: tools/db_bench/src/workloads/mergerandom.rs:27-35
Timestamp: 2026-03-21T17:08:52.777Z
Learning: In `structured-world/lsm-tree`, `lsm_tree`'s merge operator API is internal (not public). The `mergerandom` workload in `tools/db_bench/src/workloads/mergerandom.rs` is intentionally an overwrite/compaction stress benchmark rather than a true merge-operand benchmark. True merge-operand benchmarks are planned for a future PR using the CoordiNode posting list API. Do not flag `mergerandom` for lacking merge-operand calls in PR `#45` or future PRs until that API is exposed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/abstract_tree.rs:128-129
Timestamp: 2026-03-22T03:19:31.834Z
Learning: In `structured-world/lsm-tree` (`src/abstract_tree.rs`), the flush path (`AbstractTree::flush`) intentionally does NOT pass range tombstones into `CompactionStream` for RT-boundary-aware merge resolution. Flush is a co-location pass: both KV entries (including merged operands) and RTs are written together into the output SST. RT suppression is enforced exclusively at read time via `is_suppressed_by_range_tombstones` (point reads) and `RangeTombstoneFilter` (range scans). Do not flag the absence of RT propagation into `CompactionStream` during flush as a merge-semantics bug in PR `#28` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/memtable/interval_tree.rs : `src/memtable/interval_tree.rs` — Interval tree for memtable range tombstone queries

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.226Z
Learning: Applies to **/*.rs : Flag incorrect merge semantics: tombstones not propagated to lower levels, point deletes applied out of order (Tier 1 — MUST flag)

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

polaz · 2026-03-22T10:46:44Z

@coderabbitai re-review

coderabbitai · 2026-03-22T10:46:59Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/blob_tree/mod.rs:648-655
Timestamp: 2026-03-21T09:32:08.642Z
Learning: In `structured-world/lsm-tree`, `BlobTree::merge` (in `src/blob_tree/mod.rs`) delegates directly to `self.index.merge(...)`. BlobTree's `get()`/`resolve_key()` does not yet resolve merge operands — it handles Indirection but not merge operand collection. Full BlobTree merge read-path support is a documented known limitation deferred to follow-up work (introduced in PR `#28`). The compaction path already handles merge operands correctly, including Indirection safety via the pending buffer. Do not flag the absence of BlobTree merge read-path resolution as a blocking concern in PR `#28` or future PRs until the follow-up work is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.226Z
Learning: Applies to **/*.rs : Flag incorrect merge semantics: tombstones not propagated to lower levels, point deletes applied out of order (Tier 1 — MUST flag)

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/abstract_tree.rs:128-129
Timestamp: 2026-03-22T03:19:31.834Z
Learning: In `structured-world/lsm-tree` (`src/abstract_tree.rs`), the flush path (`AbstractTree::flush`) intentionally does NOT pass range tombstones into `CompactionStream` for RT-boundary-aware merge resolution. Flush is a co-location pass: both KV entries (including merged operands) and RTs are written together into the output SST. RT suppression is enforced exclusively at read time via `is_suppressed_by_range_tombstones` (point reads) and `RangeTombstoneFilter` (range scans). Do not flag the absence of RT propagation into `CompactionStream` during flush as a merge-semantics bug in PR `#28` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 45
File: tools/db_bench/src/workloads/mergerandom.rs:27-35
Timestamp: 2026-03-21T17:08:52.777Z
Learning: In `structured-world/lsm-tree`, `lsm_tree`'s merge operator API is internal (not public). The `mergerandom` workload in `tools/db_bench/src/workloads/mergerandom.rs` is intentionally an overwrite/compaction stress benchmark rather than a true merge-operand benchmark. True merge-operand benchmarks are planned for a future PR using the CoordiNode posting list API. Do not flag `mergerandom` for lacking merge-operand calls in PR `#45` or future PRs until that API is exposed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

polaz · 2026-03-22T11:15:20Z

@coderabbitai re-review

coderabbitai · 2026-03-22T11:15:34Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/blob_tree/mod.rs:648-655
Timestamp: 2026-03-21T09:32:08.642Z
Learning: In `structured-world/lsm-tree`, `BlobTree::merge` (in `src/blob_tree/mod.rs`) delegates directly to `self.index.merge(...)`. BlobTree's `get()`/`resolve_key()` does not yet resolve merge operands — it handles Indirection but not merge operand collection. Full BlobTree merge read-path support is a documented known limitation deferred to follow-up work (introduced in PR `#28`). The compaction path already handles merge operands correctly, including Indirection safety via the pending buffer. Do not flag the absence of BlobTree merge read-path resolution as a blocking concern in PR `#28` or future PRs until the follow-up work is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/abstract_tree.rs:128-129
Timestamp: 2026-03-22T03:19:31.834Z
Learning: In `structured-world/lsm-tree` (`src/abstract_tree.rs`), the flush path (`AbstractTree::flush`) intentionally does NOT pass range tombstones into `CompactionStream` for RT-boundary-aware merge resolution. Flush is a co-location pass: both KV entries (including merged operands) and RTs are written together into the output SST. RT suppression is enforced exclusively at read time via `is_suppressed_by_range_tombstones` (point reads) and `RangeTombstoneFilter` (range scans). Do not flag the absence of RT propagation into `CompactionStream` during flush as a merge-semantics bug in PR `#28` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.226Z
Learning: Applies to **/*.rs : Flag incorrect merge semantics: tombstones not propagated to lower levels, point deletes applied out of order (Tier 1 — MUST flag)

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 45
File: tools/db_bench/src/workloads/mergerandom.rs:27-35
Timestamp: 2026-03-21T17:08:52.777Z
Learning: In `structured-world/lsm-tree`, `lsm_tree`'s merge operator API is internal (not public). The `mergerandom` workload in `tools/db_bench/src/workloads/mergerandom.rs` is intentionally an overwrite/compaction stress benchmark rather than a true merge-operand benchmark. True merge-operand benchmarks are planned for a future PR using the CoordiNode posting list API. Do not flag `mergerandom` for lacking merge-operand calls in PR `#45` or future PRs until that API is exposed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

polaz · 2026-03-22T11:37:36Z

@coderabbitai re-review

coderabbitai · 2026-03-22T11:37:48Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:12.274Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.724Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/abstract_tree.rs:128-129
Timestamp: 2026-03-22T03:19:31.834Z
Learning: In `structured-world/lsm-tree` (`src/abstract_tree.rs`), the flush path (`AbstractTree::flush`) intentionally does NOT pass range tombstones into `CompactionStream` for RT-boundary-aware merge resolution. Flush is a co-location pass: both KV entries (including merged operands) and RTs are written together into the output SST. RT suppression is enforced exclusively at read time via `is_suppressed_by_range_tombstones` (point reads) and `RangeTombstoneFilter` (range scans). Do not flag the absence of RT propagation into `CompactionStream` during flush as a merge-semantics bug in PR `#28` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.226Z
Learning: Applies to **/*.rs : Flag incorrect merge semantics: tombstones not propagated to lower levels, point deletes applied out of order (Tier 1 — MUST flag)

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/blob_tree/mod.rs:648-655
Timestamp: 2026-03-21T09:32:08.642Z
Learning: In `structured-world/lsm-tree`, `BlobTree::merge` (in `src/blob_tree/mod.rs`) delegates directly to `self.index.merge(...)`. BlobTree's `get()`/`resolve_key()` does not yet resolve merge operands — it handles Indirection but not merge operand collection. Full BlobTree merge read-path support is a documented known limitation deferred to follow-up work (introduced in PR `#28`). The compaction path already handles merge operands correctly, including Indirection safety via the pending buffer. Do not flag the absence of BlobTree merge read-path resolution as a blocking concern in PR `#28` or future PRs until the follow-up work is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.598Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 45
File: tools/db_bench/src/workloads/mergerandom.rs:27-35
Timestamp: 2026-03-21T17:08:52.777Z
Learning: In `structured-world/lsm-tree`, `lsm_tree`'s merge operator API is internal (not public). The `mergerandom` workload in `tools/db_bench/src/workloads/mergerandom.rs` is intentionally an overwrite/compaction stress benchmark rather than a true merge-operand benchmark. True merge-operand benchmarks are planned for a future PR using the CoordiNode posting list API. Do not flag `mergerandom` for lacking merge-operand calls in PR `#45` or future PRs until that API is exposed.

✅ Actions performed

Full review triggered.

- Remove stale line number reference from merge_rt test comment - Reword prefix bloom test docstring to describe correctness validation independent of bloom filter behavior - Add comment explaining why multi-table runs skip bloom filtering

- Remove brittle line number reference from merge_tombstone test - Add code comment explaining intentional TreeIter reuse in resolve_merge_via_pipeline (bloom pre-filter + shared logic)

- Broaden multi-table run bloom guard to trigger on key_hash too (was prefix_hash-only, so point-read merges skipped bloom pruning) - Replace inline prefix bloom check with bloom_passes() for consistent prefix+key bloom handling and metrics - Soften test comments to not assert definite bloom behavior - Rename prefix bloom test to describe correctness scenario

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `3d97a40`	Previous: `b9c6897`	Ratio
`fillrandom`	`875131.7860956681` ops/sec	`1078840.3879110864` ops/sec	`1.23`
`readrandom`	`550753.7447771127` ops/sec	`655440.3305655628` ops/sec	`1.19`
`overwrite`	`740668.2265781432` ops/sec	`929551.8754039077` ops/sec	`1.26`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz · 2026-03-22T13:49:41Z

@coderabbitai re-review

coderabbitai · 2026-03-22T13:49:56Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:14.362Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:05.620Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/abstract_tree.rs:128-129
Timestamp: 2026-03-22T03:19:33.840Z
Learning: In `structured-world/lsm-tree` (`src/abstract_tree.rs`), the flush path (`AbstractTree::flush`) intentionally does NOT pass range tombstones into `CompactionStream` for RT-boundary-aware merge resolution. Flush is a co-location pass: both KV entries (including merged operands) and RTs are written together into the output SST. RT suppression is enforced exclusively at read time via `is_suppressed_by_range_tombstones` (point reads) and `RangeTombstoneFilter` (range scans). Do not flag the absence of RT propagation into `CompactionStream` during flush as a merge-semantics bug in PR `#28` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/blob_tree/mod.rs:648-655
Timestamp: 2026-03-21T09:32:12.067Z
Learning: In `structured-world/lsm-tree`, `BlobTree::merge` (in `src/blob_tree/mod.rs`) delegates directly to `self.index.merge(...)`. BlobTree's `get()`/`resolve_key()` does not yet resolve merge operands — it handles Indirection but not merge operand collection. Full BlobTree merge read-path support is a documented known limitation deferred to follow-up work (introduced in PR `#28`). The compaction path already handles merge operands correctly, including Indirection safety via the pending buffer. Do not flag the absence of BlobTree merge read-path resolution as a blocking concern in PR `#28` or future PRs until the follow-up work is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 45
File: tools/db_bench/src/workloads/mergerandom.rs:27-35
Timestamp: 2026-03-21T17:08:56.084Z
Learning: In `structured-world/lsm-tree`, `lsm_tree`'s merge operator API is internal (not public). The `mergerandom` workload in `tools/db_bench/src/workloads/mergerandom.rs` is intentionally an overwrite/compaction stress benchmark rather than a true merge-operand benchmark. True merge-operand benchmarks are planned for a future PR using the CoordiNode posting list API. Do not flag `mergerandom` for lacking merge-operand calls in PR `#45` or future PRs until that API is exposed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

coderabbitai

🧹 Nitpick comments (1)

src/tree/mod.rs (1)
776-816: Please attach the deep-tree point-read benchmark before merge.

This rewires the hot merge point-read path, and the PR acceptance criteria still call for staying within roughly 5% latency on a 100-table tree. The semantic coverage looks solid; the missing release signal here is the benchmark data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tree/mod.rs` around lines 776 - 816, Before merging, add and attach the
deep-tree point-read benchmark results that measure latency on a ~100-table tree
for the new hot-path in resolve_merge_via_pipeline (the TreeIter-based
point-read/merge path). Run the existing benchmark harness (the deep-tree
point-read benchmark used in previous PRs) against the rewritten path
(resolve_merge_via_pipeline) and include the before/after latency numbers and
configuration (table count, data size, seqno, merge_operator used) in the PR so
reviewers can verify you stayed within the ~5% latency budget.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/tree/mod.rs`:
- Around line 776-816: Before merging, add and attach the deep-tree point-read
benchmark results that measure latency on a ~100-table tree for the new hot-path
in resolve_merge_via_pipeline (the TreeIter-based point-read/merge path). Run
the existing benchmark harness (the deep-tree point-read benchmark used in
previous PRs) against the rewritten path (resolve_merge_via_pipeline) and
include the before/after latency numbers and configuration (table count, data
size, seqno, merge_operator used) in the PR so reviewers can verify you stayed
within the ~5% latency budget.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef11b0e0-045a-49ef-871a-68b9173f069a

📥 Commits

Reviewing files that changed from the base of the PR and between b9c6897 and 56aafdb.

📒 Files selected for processing (6)

src/memtable/mod.rs
src/range.rs
src/table/mod.rs
src/tree/mod.rs
tests/merge_operator.rs
tests/tree_prefix_bloom.rs

💤 Files with no reviewable changes (1)

src/memtable/mod.rs

Criterion benchmark measuring point-read merge resolution latency on trees with 10/50/100 L0 tables, both cached and uncached. Exercises the bloom-filtered iterator pipeline.

polaz · 2026-03-22T14:07:15Z

Benchmark: merge point-read on deep L0

cargo bench --bench merge_point_read — merge resolution via resolve_merge_via_pipeline on trees with bloom-filtered L0 tables.

Setup: 1 base value + N-1 unrelated tables (bloom rejects) + 1 merge operand in memtable. CounterMerge operator. macOS ARM64.

Tables	Uncached (µs)	Cached (µs)
10	2.23	1.33
50	2.33	1.56
100	2.90	1.99

Key observations:

10→100 tables: +30% uncached, +50% cached — bloom pre-filter rejects 99/100 tables, latency does not grow linearly
Sub-3µs for 100-table uncached merge point-read
Cached path is ~1.5x faster than uncached (block cache eliminates disk I/O)
Bloom filter effectiveness confirmed: with 100 tables only 1-2 pass the filter

Benchmark added in commit 4836e82 (benches/merge_point_read.rs).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

benches/merge_point_read.rs (1)

35-112: Extract shared tree-population logic for cached/uncached cases.

Both branches duplicate insert/flush/merge setup. A small helper (parameterized by cache size) would reduce drift and make benchmark intent easier to maintain.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benches/merge_point_read.rs` around lines 35 - 112, Duplicate setup for
populating trees (in merge_point_read.rs using variables tree and tree_cached
and their insert/flush/merge flows) should be extracted into a single helper
function (e.g., build_populated_tree or populate_tree) that accepts parameters
for cache size (or Cache::with_capacity_bytes arg), table_count and initial
seqno and returns the opened tree and the final sequence number; replace both
uncached and cached blocks with calls to this helper and use its returned seqno
for the get/merge checks to eliminate duplicated insert/flush/merge logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/merge_point_read.rs`:
- Around line 37-47: Multiple calls in the benchmark setup (tempdir(),
Config::new(...) .use_cache(...).with_merge_operator(...).open(), and other
operations referenced in this diff) use unwrap() on fallible filesystem and tree
operations which can panic; change these to fallible handling by returning
Result from the benchmark helper and using the ? operator (or explicit match/if
let error handling) so errors propagate to the harness boundary, and update the
caller to handle the Result; specifically replace tempdir().unwrap(), the
Config::new(...).open().unwrap(), and the other listed unwrap() sites with ? (or
proper error branching) and ensure functions like the setup helper return
Result<..., E> so the harness can handle/report I/O/runtime errors instead of
panicking.

---

Nitpick comments:
In `@benches/merge_point_read.rs`:
- Around line 35-112: Duplicate setup for populating trees (in
merge_point_read.rs using variables tree and tree_cached and their
insert/flush/merge flows) should be extracted into a single helper function
(e.g., build_populated_tree or populate_tree) that accepts parameters for cache
size (or Cache::with_capacity_bytes arg), table_count and initial seqno and
returns the opened tree and the final sequence number; replace both uncached and
cached blocks with calls to this helper and use its returned seqno for the
get/merge checks to eliminate duplicated insert/flush/merge logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a481a27-0b5f-43e4-b36d-3c7914474811

📥 Commits

Reviewing files that changed from the base of the PR and between 56aafdb and 4836e82.

📒 Files selected for processing (2)

Cargo.toml
benches/merge_point_read.rs

✅ Files skipped from review due to trivial changes (1)

Cargo.toml

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 22, 2026 08:20

Copilot started reviewing on behalf of polaz March 22, 2026 08:23 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

polaz requested a review from Copilot March 22, 2026 08:49

Copilot started reviewing on behalf of polaz March 22, 2026 08:55 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/table/mod.rs Outdated

Comment thread src/tree/mod.rs Outdated

Comment thread tests/merge_operator.rs Outdated

Comment thread tests/merge_operator.rs

polaz force-pushed the feat/#46-refactor-unify-merge-resolution-via-bloom-filtered branch from cec697b to feb7585 Compare March 22, 2026 09:09

polaz requested a review from Copilot March 22, 2026 09:53

Copilot started reviewing on behalf of polaz March 22, 2026 09:57 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/range.rs

Comment thread tests/merge_operator.rs Outdated

Comment thread tests/tree_prefix_bloom.rs Outdated

polaz force-pushed the feat/#46-refactor-unify-merge-resolution-via-bloom-filtered branch 2 times, most recently from 58fcb1a to b53720a Compare March 22, 2026 10:33

polaz requested a review from Copilot March 22, 2026 10:46

Copilot started reviewing on behalf of polaz March 22, 2026 10:52 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/tree/mod.rs

Comment thread tests/merge_operator.rs

polaz requested a review from Copilot March 22, 2026 11:15

Copilot started reviewing on behalf of polaz March 22, 2026 11:15 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Comment thread src/tree/mod.rs

polaz force-pushed the feat/#46-refactor-unify-merge-resolution-via-bloom-filtered branch from 661ea79 to 17131f7 Compare March 22, 2026 11:37

polaz requested a review from Copilot March 22, 2026 11:37

polaz added 5 commits March 22, 2026 15:07

fix: remove leftover conflict marker in tree_prefix_bloom test

8e6ba0f

docs: remove stale line ref and clarify pipeline design choice

52deaef

- Remove brittle line number reference from merge_tombstone test - Add code comment explaining intentional TreeIter reuse in resolve_merge_via_pipeline (bloom pre-filter + shared logic)

docs(bloom): note conservative fallback for partitioned/TLI filters

94f767c

polaz force-pushed the feat/#46-refactor-unify-merge-resolution-via-bloom-filtered branch from 2f5304f to 56aafdb Compare March 22, 2026 13:35

sw-release-bot Bot reviewed Mar 22, 2026

View reviewed changes

polaz requested a review from Copilot March 22, 2026 13:49

Copilot started reviewing on behalf of polaz March 22, 2026 13:49 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 22, 2026

View reviewed changes

polaz mentioned this pull request Mar 22, 2026

perf: lazy iterator pipeline initialization for point-read merge path #84

Closed

5 tasks

perf(bench): add merge point-read latency benchmark for deep L0

4836e82

Criterion benchmark measuring point-read merge resolution latency on trees with 10/50/100 L0 tables, both cached and uncached. Exercises the bloom-filtered iterator pipeline.

coderabbitai Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread benches/merge_point_read.rs

refactor(bench): extract populate_merge_tree helper

3d97a40

polaz requested a review from Copilot March 22, 2026 15:27

Copilot started reviewing on behalf of polaz March 22, 2026 15:28 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

polaz merged commit 9c4b065 into main Mar 22, 2026
19 of 20 checks passed

polaz deleted the feat/#46-refactor-unify-merge-resolution-via-bloom-filtered branch March 22, 2026 15:37

sw-release-bot Bot mentioned this pull request Mar 22, 2026

chore: release v5.0.0 #60

Closed

Conversation

polaz commented Mar 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

polaz commented Mar 22, 2026

Uh oh!

coderabbitai Bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

polaz commented Mar 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 22, 2026 •

edited

Loading

codecov Bot commented Mar 22, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading