perf: optimize range tombstone lookup in table-skip and point-read by polaz · Pull Request #55 · structured-world/coordinode-lsm-tree

polaz · 2026-03-21T22:02:35Z

Summary

Separate KV/RT seqno bounds: new seqno#kv_max table metadata field tracks highest KV-only seqno, enabling table-skip for covering range tombstones co-located in the same SST
Binary search for table-skip: partition_point on sorted RT list narrows candidates from O(n) to O(log n + k) per table
Binary search for point-read: per-table RT lists sorted on load, partition_point skips RTs with start > key
Backward-compatible: old tables without seqno#kv_max fall back to seqno#max
Corruption detection: validates seqno#kv_max <= seqno#max on load

Test plan

kv_seqno_excludes_range_tombstone_seqno — verifies metadata separation
kv_seqno_equals_overall_when_no_range_tombstones — KV-only table invariant
kv_seqno_zero_for_rt_only_table — RT-only sentinel restoration
colocated_range_tombstone_suppresses_keys — co-located RT suppression
table_skip_rt_start_equals_table_min — binary search boundary case
point_read_binary_search_multiple_rts — multi-RT per table
Full test suite: 332+ unit + 41 integration passed, 0 failed

Coverage note

Patch coverage 85.71% (6 lines uncovered in src/table/meta.rs):

Corruption error path (5 lines): returns error when seqno#kv_max > seqno#max. Cannot be triggered through normal table writer API — defensive guard for on-disk corruption
Legacy fallback (1 line): falls back to seqno#max when seqno#kv_max key is absent. Only reachable with tables written before this PR — no way to create such tables via current API

Closes #27

- Separate KV/RT seqno tracking: new `seqno#kv_max` metadata field excludes range tombstone seqnos, enabling table-skip for covering RTs stored in the same table (previously impossible because `rt.seqno > get_highest_seqno()` was always false for co-located RTs) - Binary search on sorted RT list for table-skip: partition_point narrows candidates to RTs with start <= table_min, reducing O(tables × rt_count) to O(tables × (log(rt_count) + k)) - Binary search for point-read suppression: per-table RT lists are sorted on load, enabling partition_point to skip RTs with start > key - Backward-compatible: old tables without `seqno#kv_max` fall back to `seqno#max` (conservative but correct) Closes #27

coderabbitai · 2026-03-21T22:02:48Z

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a11e9fb-2899-407f-9ea8-48658db6d2ec

📥 Commits

Reviewing files that changed from the base of the PR and between 622a8cf and 17887db.

📒 Files selected for processing (6)

src/range.rs
src/table/meta.rs
src/table/mod.rs
src/table/writer/mod.rs
src/tree/mod.rs
tests/range_tombstone.rs

📝 Walkthrough

Walkthrough

This change optimizes range-tombstone suppression/lookups by tracking a KV-only highest seqno, deterministically sorting decoded range tombstones by (start asc, seqno desc), and narrowing candidate tombstones with binary-search (partition_point) before visibility/coverage checks in both range-iteration and point-read paths.

Changes

Cohort / File(s)	Summary
Range iteration & point-read logic `src/range.rs`, `src/tree/mod.rs`	Replaced full linear scans of per-table range tombstones with binary-search narrowing via `partition_point`, then evaluate visibility/coverage on the prefix only. Uses table KV-only seqno for comparisons.
Table metadata & parsing `src/table/meta.rs`, `src/table/mod.rs`	Added `ParsedMeta.highest_kv_seqno`; `Table::recover` now sorts decoded range tombstones deterministically (start asc, seqno desc). Added `Table::get_highest_kv_seqno()` to return KV-only upper bound.
Writer: metadata persistence `src/table/writer/meta.rs`, `src/table/writer/mod.rs`	Added `highest_kv_seqno` to `Metadata`, updated Default, tracked/updated during writes, preserved across RT-only sentinel writes, and persisted as `seqno#kv_max` in table metadata.
Tests `tests/range_tombstone.rs`	Added four tests validating separate KV/RT seqno tracking, same-table covering RT suppression, binary-search edge cases (RT start == table min), and point-read suppression with disjoint RTs.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as Table Writer
    participant Meta as Table Metadata
    participant Table as SST Table
    participant TreeIter as TreeIterator
    participant RangeIter as RangeIterator

    Note over Writer,Meta: Write path
    Writer->>Meta: update highest_kv_seqno (max KV seqnos)
    Writer->>Meta: persist highest_kv_seqno & highest_seqno

    Note over Table: Recover
    Table->>Meta: load highest_kv_seqno
    Table->>Table: sort decoded RTs by (start asc, seqno desc)

    Note over RangeIter,Table: Range-iteration path
    RangeIter->>Table: get_highest_kv_seqno()
    Table-->>RangeIter: kv_only_seqno
    RangeIter->>Table: partition_point(rt.start ≤ table_min)
    RangeIter->>Table: evaluate visibility & coverage on prefix

    Note over TreeIter,Table: Point-read path
    TreeIter->>Table: partition_point(rt.start ≤ key)
    TreeIter->>Table: evaluate rt.should_suppress(...) on prefix

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor(metrics): add dedicated range tombstone block counters #36: Modifies Table::recover and touches range-tombstone block loading; intersects with the new deterministic sorting and decode ordering.
feat: add range tombstones (delete_range / delete_prefix) #21: Prior work proposing separate KV/RT seqno handling and range-tombstone table-skip improvements; foundational to the KV-only seqno changes.
feat(range-tombstone): add structured error context to decode_range_tombstones #37: Changes in range-tombstone handling within src/table/mod.rs that may conflict with or complement the new RT sorting and recovery logic.

Poem

🐰 Hop-hop, I sorted the stones with care,

partitioned the starts so searches are fair,
KVs keep their own highest crown,
tombstones trimmed so scans slow down,
nibble on bugs — now fast paths are rare!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'perf: optimize range tombstone lookup in table-skip and point-read' directly and accurately summarizes the main change: performance optimization of range tombstone lookups in two hot paths.
Linked Issues check	✅ Passed	All coding requirements from issue `#27` are addressed: sorted RT lists by (start asc, seqno desc) for binary-search optimization, partition_point usage in table-skip and point-read paths, separate highest_kv_seqno tracking, backward compatibility, and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#27` objectives: metadata additions (highest_kv_seqno), RT sorting/binary-search optimization, and test coverage for colocated RTs. No extraneous modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#27-perf-optimize-range-tombstone-lookup-in-table-skip

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/range.rs`:
- Around line 242-268: The partition_point call on all_range_tombstones assumes
it's sorted by rt.start but the list is only sorted later by
all_range_tombstones.sort_by(...); fix by ensuring the slice is sorted before
using partition_point (move the sort_by earlier, before computing
table_min/table_max and candidate_end) or, alternatively, replace the
partition_point logic with a linear scan over all_range_tombstones that checks
rt.start, rt.visible_at(*cutoff), rt.fully_covers(table_min, table_max) and
rt.seqno > table_kv_seqno (using symbols: all_range_tombstones, partition_point,
sort_by, candidate_end, is_covered, table_min/table_max,
table.get_highest_kv_seqno, rt.visible_at, rt.fully_covers) so the decision to
skip a table is based on correct ordering or a full scan.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77b32b1d-cfd1-4e31-a66a-547c1b0dc9ba

📥 Commits

Reviewing files that changed from the base of the PR and between c1e43d5 and 1c72b5e.

📒 Files selected for processing (7)

src/range.rs
src/table/meta.rs
src/table/mod.rs
src/table/writer/meta.rs
src/table/writer/mod.rs
src/tree/mod.rs
tests/range_tombstone.rs

Copilot

Pull request overview

Improves range tombstone (RT) suppression performance for table-skip and point reads by separating KV-only sequence number bounds and using binary search over sorted RT lists.

Changes:

Add KV-only max seqno metadata (seqno#kv_max) to enable skipping tables even when covering RTs are co-located in the same SST.
Use partition_point to reduce RT candidate scanning for table-skip and point reads.
Add targeted tests covering KV/RT seqno separation and binary-search boundary cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/range_tombstone.rs	Adds tests for KV-only seqno metadata and binary-search-based suppression behavior.
src/tree/mod.rs	Uses binary search over per-table sorted RTs for point-read suppression.
src/table/writer/mod.rs	Tracks and persists KV-only max seqno as new table metadata.
src/table/writer/meta.rs	Extends writer metadata with `highest_kv_seqno`.
src/table/mod.rs	Sorts decoded RTs to support binary-search suppression; exposes `get_highest_kv_seqno()`.
src/table/meta.rs	Parses optional `seqno#kv_max` field with fallback for older tables.
src/range.rs	Uses binary search over RT list to reduce table-skip candidate checks and relies on KV-only seqno bound.

- Sort SST-sourced RTs before table-skip loop so partition_point operates on sorted data (was using unsorted list — incorrect results) - Propagate read_u64 error for seqno#kv_max instead of silent fallback to surface metadata corruption - Use should_suppress() in point-read path instead of inlined logic to avoid semantic drift from the canonical suppression check - Update sort comment to reference Ord implementation

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

…p-in-table-skip

…r binary search - Sort by start key explicitly instead of delegating to Ord, so the partition_point invariant is enforced locally and independent of future Ord changes - Replace [..candidate_end] slicing with .iter().take(candidate_end) to avoid clippy::indexing_slicing suppressions

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

…p-in-table-skip

- Sort per-table RTs by (start asc, seqno desc) instead of start-only, so suppression checks short-circuit on the highest-seqno RT first - Align sort order documentation with actual comparator

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

polaz · 2026-03-21T23:34:29Z

@coderabbitai re-review

coderabbitai · 2026-03-21T23:34:40Z

@polaz Sure, I'll kick off a full re-review of PR #55 now.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to issue `#58` (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until issue `#58` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree`, prefix bloom filter skipping for multi-table runs (L0, served by `RunReader` in `src/run_reader.rs`) is intentionally deferred from PR `#43` to issue `#57`. The `run.len() == 1` guard in `src/range.rs` is deliberate. Do not flag the absence of multi-table run prefix bloom skipping as a bug in PR `#43` or future PRs until issue `#57` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to a separate issue (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/range.rs:226-252
Timestamp: 2026-03-21T22:11:05.425Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping intentionally applies **only** to single-table runs (`run.len() == 1`). Multi-table runs (typically L0) are handled by `RunReader`, which merges across all tables in the run and does not support per-table prefix bloom filtering. This is an architectural limitation documented in the code — do not flag the absence of prefix bloom skipping for multi-table runs as a bug or missing optimization in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/memtable/interval_tree.rs : `src/memtable/interval_tree.rs` — Interval tree for memtable range tombstone queries

✅ Actions performed

Full review triggered.

…path

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

…p-in-table-skip

…riant - Switch to sort_unstable_by for RT sorts (no stability needed) - Clarify that highest_kv_seqno tracks all data-block item types (values, point tombstones, weak tombstones), not just values - Restore inclusive-max vs half-open documentation in table-skip - Expand backward-compat fallback rationale in ParsedMeta

…p-in-table-skip

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

- Reject tables where highest_kv_seqno exceeds overall max seqno, which indicates on-disk metadata corruption - Reword seqno-desc tiebreaker documentation to match actual behavior

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

…p-in-table-skip

… scope Test verifies logical suppression (which works regardless of whether table-skip fires), not the table-skip optimization specifically.

codecov · 2026-03-22T00:50:34Z

Codecov Report

❌ Patch coverage is 98.24561% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/table/meta.rs	96.42%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…riants - Tables without RTs have highest_kv_seqno == highest_seqno - RT-only tables have highest_kv_seqno == 0 (sentinel restored) - Assert highest_kv_seqno <= highest_seqno for all table types

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

polaz · 2026-03-22T01:19:32Z

@coderabbitai re-review

coderabbitai · 2026-03-22T01:19:45Z

@polaz Sure, I'll kick off a full re-review of PR #55 now.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to issue `#58` (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until issue `#58` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:22:16.340Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom skipping for multi-table runs (L0) is deferred to a separate issue (opened during PR `#43` review). Extending it requires passing `prefix_hash: Option<u64>` into `RunReader` (`src/run_reader.rs`) and applying `Table::maybe_contains_prefix` per-table inside `RunReader`. Do not flag the absence of multi-table run prefix bloom skipping as a blocking concern in PR `#43` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree`, prefix bloom filter skipping for multi-table runs (L0, served by `RunReader` in `src/run_reader.rs`) is intentionally deferred from PR `#43` to issue `#57`. The `run.len() == 1` guard in `src/range.rs` is deliberate. Do not flag the absence of multi-table run prefix bloom skipping as a bug in PR `#43` or future PRs until issue `#57` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/range.rs:226-252
Timestamp: 2026-03-21T22:11:05.425Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping intentionally applies **only** to single-table runs (`run.len() == 1`). Multi-table runs (typically L0) are handled by `RunReader`, which merges across all tables in the run and does not support per-table prefix bloom filtering. This is an architectural limitation documented in the code — do not flag the absence of prefix bloom skipping for multi-table runs as a bug or missing optimization in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:09.195Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.946Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

…function Extract highest_kv_seqno corruption check into validated_kv_seqno() with unit tests for valid, equal, zero, and exceeds-max cases.

- Fix test postcondition: assert_eq!(Some(0), level_table_count(0)) instead of unwrap_or(0) — properly distinguishes "L0 exists and is empty" from unexpected None (Copilot #54, CodeRabbit #56) - Fix ZstdDictionary::new() doc: handle stores full 64-bit xxh3 hash internally; id() returns lower 32 bits on demand (CodeRabbit #55)

Copilot AI review requested due to automatic review settings March 21, 2026 22:02

coderabbitai Bot reviewed Mar 21, 2026

View reviewed changes

Comment thread src/range.rs