Skip to content

feat(range-tombstone): per-source RT visibility in range/prefix iteration#39

Merged
polaz merged 12 commits into
mainfrom
feat/#33-feat-per-source-rt-visibility-in-rangeprefix-itera
Mar 21, 2026
Merged

feat(range-tombstone): per-source RT visibility in range/prefix iteration#39
polaz merged 12 commits into
mainfrom
feat/#33-feat-per-source-rt-visibility-in-rangeprefix-itera

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 20, 2026

Summary

  • Replace single global read_seqno cutoff in RangeTombstoneFilter / ActiveTombstoneSet with per-RT visibility cutoff
  • Each collected tombstone is paired with its originating source's seqno — ephemeral memtable RTs use index_seqno, all other sources use the outer scan seqno
  • Fixes two classes of bugs when ephemeral index_seqno != outer_seqno:
    • Over-suppress: ephemeral RT visible at outer_seqno but not at eph_seqno incorrectly suppressed base-tree keys
    • Leak: ephemeral RT visible at eph_seqno but not at outer_seqno failed to suppress ephemeral KVs in the merged stream
  • Cross-source dedup merges duplicate RTs by keeping the max cutoff, so an RT remains visible if ANY source's snapshot includes it

Changed files

File Change
src/active_tombstone_set.rs activate() takes per-RT cutoff_seqno param; removed global cutoff from struct
src/range_tombstone_filter.rs Internal vectors now Vec<(RangeTombstone, SeqNo)>; constructor no longer takes global read_seqno
src/range.rs Tags each collected RT with source seqno: disk/memtable → outer_seqno, ephemeral → eph_seqno; dedup merges by max cutoff
tests/range_tombstone_ephemeral.rs 7 integration tests covering both failure modes + cross-source dedup

Test plan

Unit tests (3):

  • forward_mixed_cutoffs_activates_only_visible_rt — ActiveTombstoneSet: only RT with sufficient cutoff is activated; expiry proves invisible RT was skipped
  • reverse_mixed_cutoffs_activates_only_visible_rt — ActiveTombstoneSetReverse: same with reverse expiry
  • mixed_cutoffs_suppress_only_visible_source — RangeTombstoneFilter end-to-end: visible-source RT suppresses, invisible-source RT does not

Integration tests (7):

  • Over-suppress: forward + reverse iteration (eph_seqno < outer_seqno)
  • Over-suppress: bounded range query
  • Over-suppress: prefix query
  • Leak: forward + reverse iteration (eph_seqno > outer_seqno)
  • Normal case: eph_seqno == outer_seqno (no divergence)
  • Base-tree RT uses outer_seqno, unaffected by ephemeral
  • Duplicate RT from two sources (SST + ephemeral) — dedup keeps max cutoff

Suite: full test suite passes (321+ unit + integration + doc-tests, 0 failures)

Closes #33

Summary by CodeRabbit

  • Bug Fixes

    • Fixed range tombstone visibility tracking to respect per-source cutoffs, ensuring correct key suppression across snapshots with different visibility boundaries.
    • Improved handling of ephemeral memtable range tombstones with independent visibility cutoffs.
  • Tests

    • Added comprehensive integration tests for range tombstone visibility under various snapshot cutoff configurations.

…tion

Replace the single global read_seqno cutoff in RangeTombstoneFilter and
ActiveTombstoneSet with a per-RT visibility cutoff.  Each collected
tombstone is now paired with the seqno of its originating source so that
ephemeral memtable RTs use their own index_seqno instead of the outer
scan seqno.

This fixes two classes of bugs when index_seqno != outer_seqno:
- Over-suppress: an ephemeral RT visible at outer_seqno but not at
  eph_seqno would incorrectly suppress base-tree keys.
- Leak: an ephemeral RT visible at eph_seqno but not at outer_seqno
  would fail to suppress ephemeral KVs that entered the merged stream.

- ActiveTombstoneSet::activate() now takes per-RT cutoff_seqno param
- RangeTombstoneFilter::new() accepts Vec<(RangeTombstone, SeqNo)>
- range.rs tags each collected RT with its source's seqno
- Add 6 regression tests covering both failure modes

Closes #33
Copilot AI review requested due to automatic review settings March 20, 2026 22:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Range-tombstone visibility was changed from a single global cutoff to per-tombstone cutoffs. Tombstones are now propagated as (RangeTombstone, SeqNo) pairs (source-specific seqnos like ephemeral index_seqno) and activation/visibility checks use each tombstone's associated cutoff.

Changes

Cohort / File(s) Summary
Active tombstone sets
src/active_tombstone_set.rs
Removed stored cutoff_seqno; new() is parameterless. activate(&mut, rt, cutoff_seqno) now accepts per-call cutoff. initialize_from accepts IntoIterator<Item = (RangeTombstone, SeqNo)>.
Range tombstone filter
src/range_tombstone_filter.rs
RangeTombstoneFilter::new(inner, Vec<(RangeTombstone, SeqNo)>) replaces prior signature; stores forward/reverse lists as (RangeTombstone, SeqNo); activation passes per-tombstone cutoff; sorting/dedup adjusted to preserve tombstone identity while merging cutoffs.
Tree iterator / collection
src/range.rs
TreeIter::create_range collects tombstones as (RangeTombstone, SeqNo) using source-specific seqnos (SST/sealed/active use scan seqno; ephemeral uses eph_seqno), deduplicates by tombstone identity keeping max cutoff, and applies per-tombstone visibility checks (fast-path and wrapped path updated).
Tests — ephemeral RT behavior
tests/range_tombstone_ephemeral.rs
New integration tests exercising per-source RT visibility across forward/reverse/range/prefix iterators, mixed cutoffs, deduplication behavior, and base-tree cutoff semantics.

Sequence Diagram(s)

sequenceDiagram
    participant TreeIter
    participant Sources as SSTs / Sealed / Active / Ephemeral
    participant RTFilter as RangeTombstoneFilter
    participant ActiveSet as ActiveTombstoneSet
    participant Scanner as Iterator/Scanner

    TreeIter->>Sources: collect RTs with source_seqno
    Sources-->>TreeIter: (RangeTombstone, source_seqno)*
    TreeIter->>RTFilter: new(inner, tombstones: Vec<(RT, SeqNo)>)
    RTFilter->>ActiveSet: activate(rt, cutoff_seqno) for each (rt, cutoff)
    ActiveSet->>ActiveSet: query rt.visible_at(cutoff_seqno)?
    RTFilter-->>Scanner: filter/emit entries using per-tombstone visibility
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Poem

A rabbit pairs stones with seqnos neat and bright,
Each tombstone knows its moment, none steal the light.
Ephemeral whispers guard their own small day,
No leaked keys wander, no rightful keys stray. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: introducing per-source range tombstone visibility in range/prefix iteration, replacing a single global cutoff with per-tombstone cutoffs tied to originating sources.
Linked Issues check ✅ Passed The PR fully implements the objectives from issue #33: tags each RT with originating source seqno, updates RangeTombstoneFilter to use per-RT cutoffs instead of global read_seqno, and includes comprehensive regression tests for ephemeral memtable visibility.
Out of Scope Changes check ✅ Passed All code changes are directly related to the stated objective of per-source RT visibility: modifying active tombstone sets, range tombstone filter, range collection logic, and adding targeted integration tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#33-feat-per-source-rt-visibility-in-rangeprefix-itera

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 99.24812% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/range.rs 95.83% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect range tombstone (RT) visibility when an ephemeral memtable overlay uses an index_seqno that differs from the outer scan seqno, by carrying a per-source visibility cutoff alongside each collected RT and applying suppression using that cutoff during iteration.

Changes:

  • Tag collected RTs with a per-source cutoff seqno (ephemeral uses index_seqno, other sources use outer scan seqno) and pass them through range/prefix iteration.
  • Refactor RangeTombstoneFilter and ActiveTombstoneSet to operate on Vec<(RangeTombstone, SeqNo)> (per-RT cutoff) instead of a single global read cutoff.
  • Add regression tests covering both “over-suppress” and “leak” scenarios when index_seqno != outer_seqno.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/range.rs Collects RTs with per-source cutoffs and wires updated filter into range/prefix iteration.
src/range_tombstone_filter.rs Updates filter to activate tombstones with per-tombstone cutoff seqnos for fwd/rev scans.
src/active_tombstone_set.rs Moves visibility cutoff from struct-global to per-activation parameter.
tests/range_tombstone_ephemeral.rs Adds regression coverage for ephemeral overlay RT visibility divergence cases.

Comment thread src/range.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/range.rs`:
- Around line 337-349: The dedup step using dedup_by(|a, b| a.0 == b.0) drops
differing cutoffs for identical RangeTombstone keys, which can discard a visible
copy; change the dedup to preserve the maximum cutoff for equal RTs (i.e., when
merging entries in all_range_tombstones after sort_by(|a,b| a.0.cmp(&b.0)), keep
the entry with the larger cutoff rather than arbitrarily keeping the first), so
the subsequent all(... visible_at(...)) check and RangeTombstoneFilter::new see
the correct cutoff; also add a regression test covering a duplicated RT from two
sources with different cutoffs (persisted vs ephemeral) to ensure the visible
copy is preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 81e0e3e8-b736-4291-93fd-195e6023ca43

📥 Commits

Reviewing files that changed from the base of the PR and between defb0bc and 8ff500e.

📒 Files selected for processing (4)
  • src/active_tombstone_set.rs
  • src/range.rs
  • src/range_tombstone_filter.rs
  • tests/range_tombstone_ephemeral.rs

Comment thread src/range.rs
polaz added 2 commits March 21, 2026 01:06
…ources

When the same RT exists in both a persisted SST (cutoff=outer_seqno)
and the ephemeral memtable (cutoff=eph_seqno), dedup_by on RT-only
discards the copy with the higher cutoff, making the RT invisible
and leaking covered keys.

This test currently FAILS — the fix follows in the next commit.
When the same RT exists in multiple sources with different per-source
cutoffs (e.g., persisted SST with outer_seqno + ephemeral memtable
with eph_seqno), dedup_by on RT-only arbitrarily discarded one copy.
This could drop the copy with the higher cutoff, making the RT
invisible and leaking covered keys.

Now dedup merges duplicates by keeping the max cutoff, so the RT
stays visible if ANY source's snapshot includes it.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/range.rs Outdated
polaz added 2 commits March 21, 2026 09:42
dedup_by(|a, b|) passes a=later, b=earlier; b survives. The existing
code correctly updates b.1.  Expanded comment to prevent confusion
with the counter-intuitive parameter naming.
@polaz polaz requested a review from Copilot March 21, 2026 08:19
@structured-world structured-world deleted a comment from coderabbitai Bot Mar 21, 2026
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

@polaz Sure, I'll kick off a full re-review of PR #39 right away!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/tree/mod.rs:747-749
Timestamp: 2026-03-20T20:05:24.184Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), `get_internal_entry_from_version` searches newest-first (active memtable → sealed memtables → SST tables). Each source's `get(key, seqno)` already returns the most-recent entry visible at that seqno within that source. Therefore, a point tombstone returned from the active memtable IS the authoritative latest version for that key — returning `Ok(None)` immediately is correct and does NOT hide a higher-seqno value from a lower source. Do not flag this early-return pattern as hiding newer entries.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/active_tombstone_set.rs Outdated
Comment thread src/active_tombstone_set.rs Outdated
… assertions

Both forward_per_source_cutoff_mixed and reverse_per_source_cutoff_mixed
only checked max_active_seqno() which was ambiguous (same value whether
1 or 2 RTs activated).  Now expire past the visible RT's boundary and
assert the set becomes empty, proving the invisible RT was never activated.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/active_tombstone_set.rs (1)

349-350: Rename the new tests to include the expected behavior.

Both names capture the setup, but not the assertion being protected. Encoding the expected outcome would make failures easier to scan.

As per coding guidelines, Test naming convention: fn <what>_<condition>_<expected>().

Also applies to: 430-431

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/active_tombstone_set.rs` around lines 349 - 350, Rename the newly added
test functions to follow the project convention fn
<what>_<condition>_<expected>() so the test name encodes the assertion; for
example, change fn forward_per_source_cutoff_mixed() to a name that includes the
expected behavior (e.g.,
forward_per_source_cutoff_mixed_keeps_expected_tombstones) and apply the same
renaming pattern to the other new test(s) in this file (the other test near the
new additions) so each name clearly states what is being tested, under what
condition, and what the expected outcome is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/active_tombstone_set.rs`:
- Around line 349-350: Rename the newly added test functions to follow the
project convention fn <what>_<condition>_<expected>() so the test name encodes
the assertion; for example, change fn forward_per_source_cutoff_mixed() to a
name that includes the expected behavior (e.g.,
forward_per_source_cutoff_mixed_keeps_expected_tombstones) and apply the same
renaming pattern to the other new test(s) in this file (the other test near the
new additions) so each name clearly states what is being tested, under what
condition, and what the expected outcome is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: adafef29-a0bb-4055-a933-5ff62449b89c

📥 Commits

Reviewing files that changed from the base of the PR and between 090e5ba and e4bb194.

📒 Files selected for processing (1)
  • src/active_tombstone_set.rs

…ntion

- forward_per_source_cutoff_mixed → forward_mixed_cutoffs_activates_only_visible_rt
- reverse_per_source_cutoff_mixed → reverse_mixed_cutoffs_activates_only_visible_rt
- per_source_cutoff_different_visibility → mixed_cutoffs_suppress_only_visible_source
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread tests/range_tombstone_ephemeral.rs
Guard (IterGuard trait) is required for .key() calls on iterator items.
Add clarifying comment matching the pattern in range_tombstone.rs.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comment thread tests/range_tombstone_ephemeral.rs
Comment thread tests/range_tombstone_ephemeral.rs
Comment thread tests/range_tombstone_ephemeral.rs
Comment thread tests/range_tombstone_ephemeral.rs
Comment thread tests/range_tombstone_ephemeral.rs
Comment thread tests/range_tombstone_ephemeral.rs
Vec<Vec<u8>> compares correctly with vec![b"a", b"b"] via Rust's
blanket PartialEq impl.  Add doc comment matching range_tombstone.rs.
@polaz polaz requested a review from Copilot March 21, 2026 11:14
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

@polaz Sure, I'll kick off a full re-review of PR #39 right away!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to **/*.rs : Before flagging any potential issue, trace the call chain visible in the PR diff. Check the caller to see if it handles the case, verify type-level guarantees, and read adjacent comments about deliberate design decisions.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 23
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-17T00:34:54.145Z
Learning: In `structured-world/lsm-tree`, the lint suppression policy distinguishes between legacy and new code:
- Existing/upstream `#[allow(clippy::...)]` in **unchanged** code: treat as legacy — do **not** flag it in reviews.
- New code introduced in a PR **must** use `#[expect(clippy::...)]` with a justification reason.
This avoids churning upstream code while enforcing the `#[expect]` standard for all new contributions.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Read the 'out of scope' section in the PR description before reviewing. Do not flag items listed as out of scope — they are tracked in another PR.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/tree/mod.rs:747-749
Timestamp: 2026-03-20T20:05:24.184Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), `get_internal_entry_from_version` searches newest-first (active memtable → sealed memtables → SST tables). Each source's `get(key, seqno)` already returns the most-recent entry visible at that seqno within that source. Therefore, a point tombstone returned from the active memtable IS the authoritative latest version for that key — returning `Ok(None)` immediately is correct and does NOT hide a higher-seqno value from a lower source. Do not flag this early-return pattern as hiding newer entries.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread tests/range_tombstone_ephemeral.rs Outdated
Comment thread src/range_tombstone_filter.rs
Comment thread src/active_tombstone_set.rs
Comment thread src/range.rs
build_ephemeral silently discarded insert_range_tombstone return
value.  Now asserts size > 0 to catch rejected (start >= end) RTs.
Also document defensive re-sort in RangeTombstoneFilter::new.
@polaz polaz requested a review from Copilot March 21, 2026 11:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread tests/range_tombstone_ephemeral.rs Outdated
Comment thread tests/range_tombstone_ephemeral.rs Outdated
- Replace magic 999 with EPHEMERAL_MT_ID const
- Align key-range doc with actual inserts (a..d, not a..e)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@polaz polaz merged commit c03a443 into main Mar 21, 2026
31 checks passed
polaz added a commit that referenced this pull request Mar 21, 2026
- MvccStream RT suppression uses per-source cutoffs (u64) from the
  new (RangeTombstone, cutoff) tuple format after upstream PR #39
- resolve_merge_buffered initializes result_seqno from entries.last()
  instead of using 0 as sentinel (seqno can legitimately be 0)
- key_entries_buf reuses allocation across next_back() calls
- Document RT clone rationale in range pipeline
polaz added a commit that referenced this pull request Mar 22, 2026
- MvccStream RT suppression uses per-source cutoffs (u64) from the
  new (RangeTombstone, cutoff) tuple format after upstream PR #39
- resolve_merge_buffered initializes result_seqno from entries.last()
  instead of using 0 as sentinel (seqno can legitimately be 0)
- key_entries_buf reuses allocation across next_back() calls
- Document RT clone rationale in range pipeline
polaz added a commit that referenced this pull request Mar 22, 2026
- MvccStream RT suppression uses per-source cutoffs (u64) from the
  new (RangeTombstone, cutoff) tuple format after upstream PR #39
- resolve_merge_buffered initializes result_seqno from entries.last()
  instead of using 0 as sentinel (seqno can legitimately be 0)
- key_entries_buf reuses allocation across next_back() calls
- Document RT clone rationale in range pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: per-source RT visibility in range/prefix iteration when ephemeral memtable index_seqno differs from outer scan seqno

2 participants