Skip to content

perf: optimize range tombstone lookup in table-skip and point-read#55

Merged
polaz merged 16 commits into
mainfrom
feat/#27-perf-optimize-range-tombstone-lookup-in-table-skip
Mar 22, 2026
Merged

perf: optimize range tombstone lookup in table-skip and point-read#55
polaz merged 16 commits into
mainfrom
feat/#27-perf-optimize-range-tombstone-lookup-in-table-skip

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 21, 2026

Summary

  • Separate KV/RT seqno bounds: new seqno#kv_max table metadata field tracks highest KV-only seqno, enabling table-skip for covering range tombstones co-located in the same SST
  • Binary search for table-skip: partition_point on sorted RT list narrows candidates from O(n) to O(log n + k) per table
  • Binary search for point-read: per-table RT lists sorted on load, partition_point skips RTs with start > key
  • Backward-compatible: old tables without seqno#kv_max fall back to seqno#max
  • Corruption detection: validates seqno#kv_max <= seqno#max on load

Test plan

  • kv_seqno_excludes_range_tombstone_seqno — verifies metadata separation
  • kv_seqno_equals_overall_when_no_range_tombstones — KV-only table invariant
  • kv_seqno_zero_for_rt_only_table — RT-only sentinel restoration
  • colocated_range_tombstone_suppresses_keys — co-located RT suppression
  • table_skip_rt_start_equals_table_min — binary search boundary case
  • point_read_binary_search_multiple_rts — multi-RT per table
  • Full test suite: 332+ unit + 41 integration passed, 0 failed

Coverage note

Patch coverage 85.71% (6 lines uncovered in src/table/meta.rs):

  • Corruption error path (5 lines): returns error when seqno#kv_max > seqno#max. Cannot be triggered through normal table writer API — defensive guard for on-disk corruption
  • Legacy fallback (1 line): falls back to seqno#max when seqno#kv_max key is absent. Only reachable with tables written before this PR — no way to create such tables via current API

Closes #27

- Separate KV/RT seqno tracking: new `seqno#kv_max` metadata field
  excludes range tombstone seqnos, enabling table-skip for covering
  RTs stored in the same table (previously impossible because
  `rt.seqno > get_highest_seqno()` was always false for co-located RTs)
- Binary search on sorted RT list for table-skip: partition_point
  narrows candidates to RTs with start <= table_min, reducing
  O(tables × rt_count) to O(tables × (log(rt_count) + k))
- Binary search for point-read suppression: per-table RT lists are
  sorted on load, enabling partition_point to skip RTs with
  start > key
- Backward-compatible: old tables without `seqno#kv_max` fall back
  to `seqno#max` (conservative but correct)

Closes #27
Copilot AI review requested due to automatic review settings March 21, 2026 22:02
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a11e9fb-2899-407f-9ea8-48658db6d2ec

📥 Commits

Reviewing files that changed from the base of the PR and between 622a8cf and 17887db.

📒 Files selected for processing (6)
  • src/range.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/writer/mod.rs
  • src/tree/mod.rs
  • tests/range_tombstone.rs
📝 Walkthrough

Walkthrough

This change optimizes range-tombstone suppression/lookups by tracking a KV-only highest seqno, deterministically sorting decoded range tombstones by (start asc, seqno desc), and narrowing candidate tombstones with binary-search (partition_point) before visibility/coverage checks in both range-iteration and point-read paths.

Changes

Cohort / File(s) Summary
Range iteration & point-read logic
src/range.rs, src/tree/mod.rs
Replaced full linear scans of per-table range tombstones with binary-search narrowing via partition_point, then evaluate visibility/coverage on the prefix only. Uses table KV-only seqno for comparisons.
Table metadata & parsing
src/table/meta.rs, src/table/mod.rs
Added ParsedMeta.highest_kv_seqno; Table::recover now sorts decoded range tombstones deterministically (start asc, seqno desc). Added Table::get_highest_kv_seqno() to return KV-only upper bound.
Writer: metadata persistence
src/table/writer/meta.rs, src/table/writer/mod.rs
Added highest_kv_seqno to Metadata, updated Default, tracked/updated during writes, preserved across RT-only sentinel writes, and persisted as seqno#kv_max in table metadata.
Tests
tests/range_tombstone.rs
Added four tests validating separate KV/RT seqno tracking, same-table covering RT suppression, binary-search edge cases (RT start == table min), and point-read suppression with disjoint RTs.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as Table Writer
    participant Meta as Table Metadata
    participant Table as SST Table
    participant TreeIter as TreeIterator
    participant RangeIter as RangeIterator

    Note over Writer,Meta: Write path
    Writer->>Meta: update highest_kv_seqno (max KV seqnos)
    Writer->>Meta: persist highest_kv_seqno & highest_seqno

    Note over Table: Recover
    Table->>Meta: load highest_kv_seqno
    Table->>Table: sort decoded RTs by (start asc, seqno desc)

    Note over RangeIter,Table: Range-iteration path
    RangeIter->>Table: get_highest_kv_seqno()
    Table-->>RangeIter: kv_only_seqno
    RangeIter->>Table: partition_point(rt.start ≤ table_min)
    RangeIter->>Table: evaluate visibility & coverage on prefix

    Note over TreeIter,Table: Point-read path
    TreeIter->>Table: partition_point(rt.start ≤ key)
    TreeIter->>Table: evaluate rt.should_suppress(...) on prefix
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hop-hop, I sorted the stones with care,

partitioned the starts so searches are fair,
KVs keep their own highest crown,
tombstones trimmed so scans slow down,
nibble on bugs — now fast paths are rare!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'perf: optimize range tombstone lookup in table-skip and point-read' directly and accurately summarizes the main change: performance optimization of range tombstone lookups in two hot paths.
Linked Issues check ✅ Passed All coding requirements from issue #27 are addressed: sorted RT lists by (start asc, seqno desc) for binary-search optimization, partition_point usage in table-skip and point-read paths, separate highest_kv_seqno tracking, backward compatibility, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #27 objectives: metadata additions (highest_kv_seqno), RT sorting/binary-search optimization, and test coverage for colocated RTs. No extraneous modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#27-perf-optimize-range-tombstone-lookup-in-table-skip

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/range.rs`:
- Around line 242-268: The partition_point call on all_range_tombstones assumes
it's sorted by rt.start but the list is only sorted later by
all_range_tombstones.sort_by(...); fix by ensuring the slice is sorted before
using partition_point (move the sort_by earlier, before computing
table_min/table_max and candidate_end) or, alternatively, replace the
partition_point logic with a linear scan over all_range_tombstones that checks
rt.start, rt.visible_at(*cutoff), rt.fully_covers(table_min, table_max) and
rt.seqno > table_kv_seqno (using symbols: all_range_tombstones, partition_point,
sort_by, candidate_end, is_covered, table_min/table_max,
table.get_highest_kv_seqno, rt.visible_at, rt.fully_covers) so the decision to
skip a table is based on correct ordering or a full scan.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77b32b1d-cfd1-4e31-a66a-547c1b0dc9ba

📥 Commits

Reviewing files that changed from the base of the PR and between c1e43d5 and 1c72b5e.

📒 Files selected for processing (7)
  • src/range.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/writer/meta.rs
  • src/table/writer/mod.rs
  • src/tree/mod.rs
  • tests/range_tombstone.rs

Comment thread src/range.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves range tombstone (RT) suppression performance for table-skip and point reads by separating KV-only sequence number bounds and using binary search over sorted RT lists.

Changes:

  • Add KV-only max seqno metadata (seqno#kv_max) to enable skipping tables even when covering RTs are co-located in the same SST.
  • Use partition_point to reduce RT candidate scanning for table-skip and point reads.
  • Add targeted tests covering KV/RT seqno separation and binary-search boundary cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/range_tombstone.rs Adds tests for KV-only seqno metadata and binary-search-based suppression behavior.
src/tree/mod.rs Uses binary search over per-table sorted RTs for point-read suppression.
src/table/writer/mod.rs Tracks and persists KV-only max seqno as new table metadata.
src/table/writer/meta.rs Extends writer metadata with highest_kv_seqno.
src/table/mod.rs Sorts decoded RTs to support binary-search suppression; exposes get_highest_kv_seqno().
src/table/meta.rs Parses optional seqno#kv_max field with fallback for older tables.
src/range.rs Uses binary search over RT list to reduce table-skip candidate checks and relies on KV-only seqno bound.

Comment thread src/table/mod.rs Outdated
Comment thread src/range.rs
Comment thread src/range.rs
Comment thread src/table/meta.rs Outdated
Comment thread src/tree/mod.rs Outdated
- Sort SST-sourced RTs before table-skip loop so partition_point
  operates on sorted data (was using unsorted list — incorrect results)
- Propagate read_u64 error for seqno#kv_max instead of silent fallback
  to surface metadata corruption
- Use should_suppress() in point-read path instead of inlined logic
  to avoid semantic drift from the canonical suppression check
- Update sort comment to reference Ord implementation
@polaz polaz requested a review from Copilot March 21, 2026 22:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread src/range.rs Outdated
Comment thread src/range.rs
Comment thread src/table/mod.rs Outdated
Comment thread src/range.rs Outdated
polaz added 2 commits March 22, 2026 00:43
…r binary search

- Sort by start key explicitly instead of delegating to Ord, so the
  partition_point invariant is enforced locally and independent of
  future Ord changes
- Replace [..candidate_end] slicing with .iter().take(candidate_end)
  to avoid clippy::indexing_slicing suppressions
@polaz polaz requested a review from Copilot March 21, 2026 22:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/tree/mod.rs Outdated
Comment thread src/table/mod.rs Outdated
polaz added 2 commits March 22, 2026 01:16
- Sort per-table RTs by (start asc, seqno desc) instead of start-only,
  so suppression checks short-circuit on the highest-seqno RT first
- Align sort order documentation with actual comparator
@polaz polaz requested a review from Copilot March 21, 2026 23:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread src/range.rs Outdated
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

@polaz Sure, I'll kick off a full re-review of PR #55 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to issue `#58` (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until issue `#58` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree`, prefix bloom filter skipping for multi-table runs (L0, served by `RunReader` in `src/run_reader.rs`) is intentionally deferred from PR `#43` to issue `#57`. The `run.len() == 1` guard in `src/range.rs` is deliberate. Do not flag the absence of multi-table run prefix bloom skipping as a bug in PR `#43` or future PRs until issue `#57` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to a separate issue (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/range.rs:226-252
Timestamp: 2026-03-21T22:11:05.425Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping intentionally applies **only** to single-table runs (`run.len() == 1`). Multi-table runs (typically L0) are handled by `RunReader`, which merges across all tables in the run and does not support per-table prefix bloom filtering. This is an architectural limitation documented in the code — do not flag the absence of prefix bloom skipping for multi-table runs as a bug or missing optimization in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/memtable/interval_tree.rs : `src/memtable/interval_tree.rs` — Interval tree for memtable range tombstone queries
✅ Actions performed

Full review triggered.

@polaz polaz requested a review from Copilot March 21, 2026 23:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread src/table/writer/mod.rs
Comment thread src/table/mod.rs Outdated
Comment thread src/range.rs Outdated
Comment thread src/range.rs
Comment thread src/table/meta.rs
polaz added 3 commits March 22, 2026 02:00
…riant

- Switch to sort_unstable_by for RT sorts (no stability needed)
- Clarify that highest_kv_seqno tracks all data-block item types
  (values, point tombstones, weak tombstones), not just values
- Restore inclusive-max vs half-open documentation in table-skip
- Expand backward-compat fallback rationale in ParsedMeta
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/table/mod.rs Outdated
Comment thread src/table/meta.rs
- Reject tables where highest_kv_seqno exceeds overall max seqno,
  which indicates on-disk metadata corruption
- Reword seqno-desc tiebreaker documentation to match actual behavior
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/range.rs
Comment thread tests/range_tombstone.rs Outdated
polaz added 2 commits March 22, 2026 02:48
… scope

Test verifies logical suppression (which works regardless of whether
table-skip fires), not the table-skip optimization specifically.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 98.24561% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/table/meta.rs 96.42% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…riants

- Tables without RTs have highest_kv_seqno == highest_seqno
- RT-only tables have highest_kv_seqno == 0 (sentinel restored)
- Assert highest_kv_seqno <= highest_seqno for all table types
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread src/table/writer/mod.rs Outdated
@polaz polaz requested a review from Copilot March 22, 2026 01:19
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 22, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

@polaz Sure, I'll kick off a full re-review of PR #55 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to issue `#58` (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until issue `#58` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to a separate issue (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree`, prefix bloom filter skipping for multi-table runs (L0, served by `RunReader` in `src/run_reader.rs`) is intentionally deferred from PR `#43` to issue `#57`. The `run.len() == 1` guard in `src/range.rs` is deliberate. Do not flag the absence of multi-table run prefix bloom skipping as a bug in PR `#43` or future PRs until issue `#57` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/range.rs:226-252
Timestamp: 2026-03-21T22:11:05.425Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping intentionally applies **only** to single-table runs (`run.len() == 1`). Multi-table runs (typically L0) are handled by `RunReader`, which merges across all tables in the run and does not support per-table prefix bloom filtering. This is an architectural limitation documented in the code — do not flag the absence of prefix bloom skipping for multi-table runs as a bug or missing optimization in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

…function

Extract highest_kv_seqno corruption check into validated_kv_seqno()
with unit tests for valid, equal, zero, and exceeds-max cases.
@polaz polaz merged commit 0485672 into main Mar 22, 2026
15 checks passed
@polaz polaz deleted the feat/#27-perf-optimize-range-tombstone-lookup-in-table-skip branch March 22, 2026 01:41
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 22, 2026
polaz added a commit that referenced this pull request Apr 9, 2026
- Fix test postcondition: assert_eq!(Some(0), level_table_count(0)) instead
  of unwrap_or(0) — properly distinguishes "L0 exists and is empty" from
  unexpected None (Copilot #54, CodeRabbit #56)
- Fix ZstdDictionary::new() doc: handle stores full 64-bit xxh3 hash
  internally; id() returns lower 32 bits on demand (CodeRabbit #55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: optimize range tombstone lookup in table-skip and point-read paths

2 participants