Skip to content

feat(range-tombstone): add structured error context to decode_range_tombstones#37

Merged
polaz merged 28 commits into
mainfrom
feat/#26-add-structured-error-context-to-range-tombstone-de
Mar 21, 2026
Merged

feat(range-tombstone): add structured error context to decode_range_tombstones#37
polaz merged 28 commits into
mainfrom
feat/#26-add-structured-error-context-to-range-tombstone-de

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 20, 2026

Summary

  • Replace generic Error::Unrecoverable with RangeTombstoneDecode { field, offset } variant in decode_range_tombstones, preserving which field or validation failed and the pre-read byte offset
  • Capture offset before each read for consistent diagnostics; use entry_offset for interval validation errors
  • Extract read_checked_slice helper with checked_add overflow protection and cursor.get_ref() for data access
  • Reject empty RT block payload as corruption (prevents silent key resurrection)
  • Rename key-data error fields from implementation terms (start_buf, end_buf) to wire-format names (start, end); length fields (start_len, end_len) and validation (interval) retain descriptive names
  • Add 7 unit tests covering all decode error paths with offset assertions

Test plan

  • cargo test decode_range_tombstones — 7 unit tests pass (invalid interval, truncated start_len/end_len/seqno, oversized start_len/end_len, empty block corruption)
  • cargo test range_tombstone — all range tombstone tests pass
  • cargo clippy --lib --tests — no new errors in changed files
  • cargo fmt -- --check — clean
  • codecov/patch — passing

Closes #26

…ombstones

- Replace generic Error::Unrecoverable with RangeTombstoneDecode variant
  carrying field name and byte offset for each decode failure
- Add unit test for invalid interval (start >= end) corruption path
- Update tampered-block integration test to accept new error variant

Closes #26
Copilot AI review requested due to automatic review settings March 20, 2026 22:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Decode path for on-disk range tombstone blocks now returns structured errors: Error::RangeTombstoneDecode { field, offset } when parsing or validating tombstone fields (start_len, start, end_len, end, seqno, interval) fails. Decoder uses slice-based bounds checks and records per-field offsets; tests and integration expectations were updated.

Changes

Cohort / File(s) Summary
Error enum
src/error.rs
Added public variant RangeTombstoneDecode { field: &'static str, offset: u64 } to report the failing field name and byte offset when tombstone decoding fails.
Range tombstone decoder
src/table/mod.rs
Rewrote Table::decode_range_tombstones to track per-field cursor offsets, return RangeTombstoneDecode { field, offset } on failures for start_len, start, end_len, end, seqno, and interval, and to read variable-length fields via bounds-checked slices from block.data instead of read_exact. Empty RT payloads are treated as corruption.
Unit tests for decoding
src/table/tests.rs
Added helpers (rt_block, assert_rt_decode_error, assert_rt_decode_error_at) and multiple unit tests constructing malformed/truncated RT blocks asserting RangeTombstoneDecode with expected field and offset.
Integration test tolerance
tests/range_tombstone.rs
Updated range_tombstone_tampered_rt_block_fails_recovery to accept Error::RangeTombstoneDecode { .. } as an expected failure alongside previous variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I nibble bytes and mark the spot,
Where offsets whisper what they got.
Field names sing where parsers fell,
Tiny hops now tell the tale well.
Tests clap paws — the bug found, unknotted. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding structured error context to the decode_range_tombstones function.
Linked Issues check ✅ Passed The pull request meets all coding requirements from issue #26: adds RangeTombstoneDecode error variant with field and offset context, replaces generic Unrecoverable with structured errors, uses bounds-checked slice access, rejects empty RT blocks, and includes comprehensive unit tests covering all decode error paths.
Out of Scope Changes check ✅ Passed All changes directly support the linked issue objectives: error enum variant addition, decode function refactoring, and test coverage for error cases. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#26-add-structured-error-context-to-range-tombstone-de

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds structured error context for range tombstone block decoding so corruption/debugging can identify which wire-format field failed and at what offset, and updates tests to assert/accept the new error.

Changes:

  • Introduce Error::RangeTombstoneDecode { field, offset } and use it throughout Table::decode_range_tombstones.
  • Add a unit test that encodes an invalid interval (start >= end) via the production wire format and asserts the new error variant.
  • Update the tampered RT-block recovery integration test to accept the new error variant.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/table/mod.rs Replaces generic decode failures with RangeTombstoneDecode { field, offset } during RT block parsing.
src/error.rs Adds the new structured error variant to the public Error enum.
src/table/tests.rs Adds a new unit test covering invalid interval decoding behavior.
tests/range_tombstone.rs Expands the expected error set for tampered RT-block recovery.

Comment thread src/table/mod.rs
Comment thread src/table/tests.rs Outdated
polaz added 2 commits March 21, 2026 00:57
… in test

- Snapshot cursor.position() into local `offset` before each read so
  error reports the field's start byte, not the post-advance position
- Replace `buf.len() as u32` with `u32::try_from().expect()` in test
  to get a checked conversion and drop cast_possible_truncation suppress
…rors

- Test truncated start_len, end_len, seqno reads against short buffers
- Test start_len and end_len exceeding remaining block data
- Test empty block decodes to empty vec
- Extract rt_block() and assert_rt_decode_error() test helpers
read_exact on start_buf/end_buf was unreachable after bounds validation
— replace with direct slice + set_position to eliminate dead error paths
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/table/tests.rs Outdated
Comment thread src/table/tests.rs Outdated
polaz added 3 commits March 21, 2026 01:13
- Add #[expect(clippy::expect_used)] to rt_block() helper
- Replace .expect() with match in empty-block test
…copy

- Replace panicking data[pos..pos+len] with data.get().ok_or()
- Backtick-escape identifiers in test helper doc strings
@polaz polaz requested a review from Copilot March 21, 2026 07:39
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== src/error.rs ===" && cat src/error.rs
echo ""
echo "=== src/table/mod.rs (decode_range_tombstones) ===" && rg -n "decode_range_tombstones" src/table/mod.rs -A 80
echo ""
echo "=== src/table/tests.rs (new test) ===" && rg -n "decode_range_tombstones_invalid" src/table/tests.rs -A 50
echo ""
echo "=== tests/range_tombstone.rs (tampered test) ===" && rg -n "tampered_rt_block" tests/range_tombstone.rs -A 40
echo ""
echo "=== src/range_tombstone.rs ===" && cat src/range_tombstone.rs

Length of output: 23495


🏁 Script executed:

#!/bin/bash
echo "=== Full decode_range_tombstones function ===" && rg -n "fn decode_range_tombstones" src/table/mod.rs -A 120
echo ""
echo "=== Test helpers (rt_block, assert_rt_decode_error) ===" && rg -n "fn rt_block\|fn assert_rt_decode_error" src/table/tests.rs -A 25
echo ""
echo "=== Full new test + surrounding tests ===" && sed -n '1440,1560p' src/table/tests.rs
echo ""
echo "=== Error Display impl or PartialEq for Error ===" && rg -n "PartialEq\|RangeTombstoneDecode" src/error.rs
echo ""
echo "=== integration test full match arm + panic message ===" && rg -n "RangeTombstoneDecode\|tampered" tests/range_tombstone.rs -A 5 -B 3

Length of output: 8820


@polaz: The PR is well-structured and the new error variant cleanly replaces the generic Unrecoverable with field-level context. The test suite coverage across all 7 failure paths is solid. Three findings below:


Finding 1 — interval error's offset points past the full record (minor correctness)

In src/table/mod.rs Line 728, the interval error captures cursor.position() after read_u64 has consumed the seqno bytes — so offset points to the byte immediately following the complete tombstone record, not to the start of the malformed tombstone:

// Line 711
let offset = cursor.position();  // after end_buf
let seqno = cursor.read_u64::<LE>()…?;

// Line 724-729
if start >= end {
    return Err(crate::Error::RangeTombstoneDecode {
        field: "interval",
        offset: cursor.position(),  // ← points PAST the record (after seqno)
    });
}

The interval field refers to the relationship between start and end, both of which live at the start of the record. Capturing the record-start offset (the offset held before reading start_len) would make this consistent with the other error sites and point to the actual corrupted data. Suggested fix:

// capture record start before start_len is read:
let record_start = cursor.position();
let offset = record_start;
let start_len = cursor.read_u16::<LE>().map_err(|_| …)?;
// … rest of parsing …

if start >= end {
    return Err(crate::Error::RangeTombstoneDecode {
        field: "interval",
        offset: record_start,  // points to the start of the tombstone entry
    });
}

Finding 2 — Unreachable .ok_or() branches for start_buf / end_buf (nit)

After the bounds guard (if start_len > remaining { return Err(…) }), the subsequent .get(pos..pos + start_len).ok_or(…) can never return None — making the start_buf and end_buf error variants dead code:

// Bounds already validated: start_len <= remaining
let start_buf = data
    .get(pos..pos + start_len)
    .ok_or(crate::Error::RangeTombstoneDecode {  // ← unreachable
        field: "start_buf",
        offset,
    })?
    .to_vec();

Direct slice indexing would remove the ambiguity:

let start_buf = data[pos..pos + start_len].to_vec();

Same applies to end_buf. This also aligns with the existing comment // Bounds already validated.


Finding 3 — #[expect(clippy::cast_possible_truncation)] does not cover inner casts (nit)

The #[expect] attribute at Line 643–646 suppresses the lint only for the while condition cursor.position() as usize. The multiple cursor.position() as usize casts inside the loop body (Lines 658, 670, 678, 689, 701, 709) are not covered. Since blocks are size-bounded in practice this is a low-risk cosmetic issue, but it may surface clippy::cast_possible_truncation warnings on future clippy runs. Consider extracting a small helper:

#[inline]
fn pos_as_usize(cursor: &Cursor<&[u8]>) -> usize {
    // SAFETY: block sizes are bounded to << usize::MAX
    cursor.position() as usize
}

or adding #[allow(clippy::cast_possible_truncation)] at function scope with a reason.


🧠 Learnings used
Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.227Z
Learning: Applies to **/*test*.rs : Corruption tests must tamper the relevant on-disk field (checksum, block header, segment metadata) and assert the expected error

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/table/mod.rs Outdated
Comment thread src/table/mod.rs Outdated
polaz added 2 commits March 21, 2026 10:17
- Use pos (data start) instead of offset (field-length start) in
  start_buf and end_buf error variants for accurate diagnostics
@polaz polaz requested a review from Copilot March 21, 2026 08:43
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== decode_range_tombstones full function ===" && rg -n "fn decode_range_tombstones" src/table/mod.rs -A 130
echo ""
echo "=== Error variants ===" && rg -n "RangeTombstoneDecode\|Unrecoverable" src/error.rs -B2 -A5
echo ""
echo "=== New unit test ===" && rg -n "decode_range_tombstones_invalid" src/table/tests.rs -A 60
echo ""
echo "=== Integration test match arm ===" && rg -n "RangeTombstoneDecode\|tampered_rt_block" tests/range_tombstone.rs -B3 -A10
echo ""
echo "=== cast_possible_truncation and expect attributes near decode_range_tombstones ===" && rg -n "cast_possible_truncation\|expect\|allow" src/table/mod.rs -B1 -A1

Length of output: 7845


@polaz: Thanks for the re-review request. After reading the current code, all three findings from the previous review are still present in the diff. The test suite has expanded nicely (6+ unit tests now covering all error paths), which is a good improvement. Here's the status of each finding:


Finding 1 — interval offset still points past the seqno (unaddressed)

Line 728 still captures cursor.position() after read_u64 has consumed the seqno:

// Line 711-718: seqno consumed here
let offset = cursor.position();  // before seqno
let seqno = cursor.read_u64::<LE>()…?;

// Line 724-729: cursor now points past seqno
if start >= end {
    return Err(crate::Error::RangeTombstoneDecode {
        field: "interval",
        offset: cursor.position(),  // ← still points AFTER seqno, not to record start
    });
}

The fix from the previous review still applies — save the record-start offset before reading start_len:

while (cursor.position() as usize) < data.len() {
    let record_start = cursor.position();  // ← add this
    let offset = record_start;
    let start_len = cursor.read_u16::<LE>()…?;
    // … rest of parsing unchanged …

    if start >= end {
        return Err(crate::Error::RangeTombstoneDecode {
            field: "interval",
            offset: record_start,  // ← points to the tombstone entry start
        });
    }
}

Finding 2 — Unreachable .ok_or() for start_buf / end_buf (unaddressed)

The // Bounds already validated comments at lines 669 and 700 themselves document that the .ok_or(...) below cannot fire:

// Bounds already validated: start_len <= remaining   ← comment says it all
let pos = cursor.position() as usize;
let start_buf = data
    .get(pos..pos + start_len)
    .ok_or(crate::Error::RangeTombstoneDecode {   // ← dead code
        field: "start_buf",
        offset: pos as u64,
    })?
    .to_vec();

Since the bounds guard has already returned an error if start_len > remaining, this slice is guaranteed to be in-bounds. Direct indexing is cleaner and removes the dead error variant:

let start_buf = data[pos..pos + start_len].to_vec();
cursor.set_position((pos + start_len) as u64);

Same applies to end_buf (lines 701-708).


Finding 3 — #[expect(clippy::cast_possible_truncation)] scope (unaddressed)

The attribute at lines 643-646 suppresses the lint only on the while condition. The as usize casts on lines 658, 670, 678, 689, 701, and 709 are outside its scope. A function-level #[allow] with a reason comment would cover all of them uniformly:

#[allow(clippy::cast_possible_truncation, reason = "block sizes are bounded well within usize")]
fn decode_range_tombstones(block: &Block) -> crate::Result<Vec<RangeTombstone>> {

The expanded test suite (6 unit tests covering all 7 error paths + integration test update) is solid — no new findings there.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to **/*.rs : Before flagging any potential issue, trace the call chain visible in the PR diff. Check the caller to see if it handles the case, verify type-level guarantees, and read adjacent comments about deliberate design decisions.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.227Z
Learning: Applies to **/*test*.rs : Corruption tests must tamper the relevant on-disk field (checksum, block header, segment metadata) and assert the expected error
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/table/mod.rs`:
- Around line 723-729: The interval validation error is reporting
cursor.position() (end-of-entry) instead of the entry start, making offsets
inconsistent; fix by recording the entry start offset when beginning to parse
the range tombstone (e.g., let entry_start = cursor.position() at the top of the
tombstone parsing routine) and use that saved entry_start as the offset in the
RangeTombstoneDecode error (replace offset: cursor.position() with offset:
entry_start) in the start >= end branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf7dfaf6-3bae-4347-9f2a-b95836196695

📥 Commits

Reviewing files that changed from the base of the PR and between c3f3232 and 1cdd844.

📒 Files selected for processing (4)
  • src/error.rs
  • src/table/mod.rs
  • src/table/tests.rs
  • tests/range_tombstone.rs

Comment thread src/table/mod.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/error.rs Outdated
Comment thread src/table/tests.rs Outdated
polaz added 2 commits March 21, 2026 10:53
Use entry_offset (start of tombstone record) instead of
cursor.position() (end of record) for consistent diagnostics
…tests

- Update offset doc to specify pre-read field-start semantics
- Add offset assertions to truncated start_len, end_len, seqno tests
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/error.rs
Comment thread src/table/tests.rs Outdated
polaz added 2 commits March 21, 2026 20:48
Remove unused assert_rt_decode_error wrapper and Option<u64> indirection.
All decode error tests now assert both field and offset directly.
@polaz polaz requested a review from Copilot March 21, 2026 18:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/error.rs:1

  • Adding a new variant to a public Error enum is a semver-breaking change for downstream crates that exhaustively match on Error (unless the enum is #[non_exhaustive]). If this crate has stability/semver guarantees, consider marking Error as #[non_exhaustive] (or otherwise documenting this as a breaking change / bumping major version).
// Copyright (c) 2024-present, fjall-rs

Comment thread src/table/mod.rs Outdated
Comment thread src/table/mod.rs Outdated
…t/end

Deduplicate slice extraction logic (checked_add + get + set_position)
into a shared helper function called for both start and end keys.
@polaz polaz requested a review from Copilot March 21, 2026 19:22
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== Full decode_range_tombstones ===" && rg -n "fn decode_range_tombstones" src/table/mod.rs -A 140
echo ""
echo "=== expect/allow/cast near function ===" && rg -n "cast_possible_truncation\|#\[expect\]\|#\[allow\]" src/table/mod.rs -B2 -A2

Length of output: 5959


🏁 Script executed:

#!/bin/bash
echo "=== read_checked_slice function ===" && rg -n "fn read_checked_slice" src/table/mod.rs -A 30
echo ""
echo "=== Crate-level lint attributes in lib.rs ===" && rg -n "cast_possible_truncation\|#!\[allow\]\|#!\[expect\]" src/lib.rs | head -30
echo ""
echo "=== Any cast_possible_truncation in entire codebase ===" && rg -rn "cast_possible_truncation" src/

Length of output: 6945


@polaz: All three original findings are now resolved — great work iterating through these:

  • Finding 1 ✅ — entry_offset captured at line 689, used correctly for the interval error at line 754.
  • Finding 2 ✅ — Self::read_checked_slice helper extracted, eliminating the inline .get().ok_or() pattern.
  • Finding 3 ✅ — #[expect(clippy::cast_possible_truncation, reason = "…")] is now correctly placed at function scope on decode_range_tombstones (lines 665–668), covering all casts in the function body.

New nit — read_checked_slice (line 647) has an uncovered u64 as usize cast

The newly extracted helper read_checked_slice has no #[expect] of its own. The attribute on decode_range_tombstones does not extend to functions it calls. Line 647:

fn read_checked_slice(
    cursor: &mut std::io::Cursor<&[u8]>,
    data: &[u8],
    field: &'static str,
    len: usize,
) -> crate::Result<Vec<u8>> {
    let pos = cursor.position() as usize;  // ← u64→usize, clippy::cast_possible_truncation

The pos as u64 and end_pos as u64 casts below (lines 652, 658, 661) are usize→u64 (always widening) and don't need suppression. Only line 647 needs it. A minimal fix:

#[expect(
    clippy::cast_possible_truncation,
    reason = "block sizes are bounded well within usize on all supported platforms"
)]
fn read_checked_slice(
    cursor: &mut std::io::Cursor<&[u8]>,
    data: &[u8],
    field: &'static str,
    len: usize,
) -> crate::Result<Vec<u8>> {

This is the only remaining nit.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 23
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-17T00:34:54.145Z
Learning: In `structured-world/lsm-tree`, the lint suppression policy distinguishes between legacy and new code:
- Existing/upstream `#[allow(clippy::...)]` in **unchanged** code: treat as legacy — do **not** flag it in reviews.
- New code introduced in a PR **must** use `#[expect(clippy::...)]` with a justification reason.
This avoids churning upstream code while enforcing the `#[expect]` standard for all new contributions.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/tree/mod.rs:1045-1063
Timestamp: 2026-03-21T17:23:51.325Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs` and `src/blob_tree/mod.rs`), the prefix bloom filter boundary guard `extractor.prefixes(prefix_bytes).any(|p| p == prefix_bytes)` is intentionally used instead of a dedicated `is_valid_scan_boundary()` trait method. The `PrefixExtractor` contract requires `prefixes()` to return sub-slices of the input key, so a well-behaved extractor that emits `prefix` for `prefix` itself will also emit it for any key starting with `prefix`. Adding `is_valid_scan_boundary()` to the public trait was deferred to issue `#47` to avoid API complexity for a theoretical misuse case. Do not flag the current predicate as insufficient in PR `#43` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to **/*.rs : Before flagging any potential issue, trace the call chain visible in the PR diff. Check the caller to see if it handles the case, verify type-level guarantees, and read adjacent comments about deliberate design decisions.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-20T22:04:21.226Z
Learning: Applies to **/*.rs : Flag incorrect merge semantics: tombstones not propagated to lower levels, point deletes applied out of order (Tier 1 — MUST flag)

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/table/mod.rs Outdated
Comment thread src/table/mod.rs Outdated
…checked_slice

- Remove redundant data parameter, use cursor.get_ref() instead
- Move wire-format doc block back to decode_range_tombstones
@polaz polaz requested a review from Copilot March 21, 2026 19:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/table/mod.rs Outdated
Comment thread src/table/mod.rs
Comment thread src/table/mod.rs
polaz added 2 commits March 21, 2026 22:10
…ead_checked_slice

Avoid u64→usize→u64 round-trip for error offset by capturing
cursor.position() as u64 before the usize cast.
@polaz polaz requested a review from Copilot March 21, 2026 20:37
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 21, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== read_checked_slice and surrounding context ===" && rg -n "fn read_checked_slice" src/table/mod.rs -A 30
echo ""
echo "=== decode_range_tombstones and cast_possible_truncation attributes ===" && rg -n "cast_possible_truncation\|#\[expect\]\|#\[allow\]" src/table/mod.rs -B2 -A3
echo ""
echo "=== Any new changes near the function ===" && sed -n '635,720p' src/table/mod.rs

Length of output: 4703


@polaz: The refactored read_checked_slice is cleaner — inlining cursor.get_ref() removes the need for a separate data parameter. The one remaining nit from the previous review is still present:

Finding — read_checked_slice missing #[expect] for its u64 as usize cast

Line 641 performs a u64 → usize truncating cast with no lint suppression:

fn read_checked_slice(
    cursor: &mut std::io::Cursor<&[u8]>,
    field: &'static str,
    len: usize,
) -> crate::Result<Vec<u8>> {
    let offset = cursor.position();
    let data = cursor.get_ref();
    let pos = offset as usize;  // ← u64→usize, no #[expect]

The #[expect(clippy::cast_possible_truncation)] on decode_range_tombstones (lines 660–663) does not cover this helper. Per the repo lint policy, new code must use #[expect] with a justification. Suggested fix:

#[expect(
    clippy::cast_possible_truncation,
    reason = "block sizes are bounded well within usize on all supported platforms"
)]
fn read_checked_slice(
    cursor: &mut std::io::Cursor<&[u8]>,
    field: &'static str,
    len: usize,
) -> crate::Result<Vec<u8>> {

The end_pos as u64 cast on line 649 is a usize → u64 widening and needs no suppression.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/multi_writer.rs:129-206
Timestamp: 2026-03-20T11:14:02.461Z
Learning: In `structured-world/lsm-tree`, the `write_rts_to_writer` function in `src/table/multi_writer.rs` has a known limitation when `clip_range_tombstones=true` (compaction mode): RT fragments that don't intersect any output writer's KV span are silently dropped, which can lose delete-only gaps (e.g. an RT covering `[m,p)` while all output tables cover `[a,l]` and `[q,z]`). This limitation is intentionally deferred to issue `#32` and is documented as a Known Limitation in PR `#21`. Do not flag this as a blocking concern in PR `#21` or any PR until `#32` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.289Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.835Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:437-457
Timestamp: 2026-03-20T12:29:43.719Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the synthetic weak-tombstone sentinel written for RT-only SSTs (when `item_count == 0` but `range_tombstones` is non-empty) intentionally increments `meta.tombstone_count`, `meta.weak_tombstone_count`, `meta.item_count`, and `meta.key_count`. All on-disk metadata counters must match actual block contents for recovery validation consistency (established in commit 7452427a for item_count/key_count; same reasoning applies to tombstone counters). The resulting +1 inflation on GC heuristics is harmless — it overestimates reclaimable space and triggers slightly earlier checks. Do not flag this as a metadata accuracy bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 21
File: src/table/writer/mod.rs:394-463
Timestamp: 2026-03-20T20:05:18.309Z
Learning: In `structured-world/lsm-tree` (`src/table/writer/mod.rs`), the `key#min`/`key#max` widening for RT bounds in `Writer::finish()` is intentionally limited to the `item_count == 0` (RT-only) branch. For mixed KV+RT tables, `MultiWriter::write_rts_to_writer` (lines 167–199 of `src/table/multi_writer.rs`) already widens `meta.first_key`/`meta.last_key` conservatively in flush mode (`clip=false`). In compaction mode (`clip=true`), RTs are intersected with the KV key range so no widening is needed. Do not flag the absence of unconditional widening in `Writer::finish()` as a bug in PR `#21` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone_filter.rs : `src/range_tombstone_filter.rs` — MVCC-aware range tombstone filtering for iterators

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 23
File: .github/instructions/rust.instructions.md:0-0
Timestamp: 2026-03-17T00:34:54.145Z
Learning: In `structured-world/lsm-tree`, the lint suppression policy distinguishes between legacy and new code:
- Existing/upstream `#[allow(clippy::...)]` in **unchanged** code: treat as legacy — do **not** flag it in reviews.
- New code introduced in a PR **must** use `#[expect(clippy::...)]` with a justification reason.
This avoids churning upstream code while enforcing the `#[expect]` standard for all new contributions.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/tree/mod.rs:1045-1063
Timestamp: 2026-03-21T17:23:51.325Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs` and `src/blob_tree/mod.rs`), the prefix bloom filter boundary guard `extractor.prefixes(prefix_bytes).any(|p| p == prefix_bytes)` is intentionally used instead of a dedicated `is_valid_scan_boundary()` trait method. The `PrefixExtractor` contract requires `prefixes()` to return sub-slices of the input key, so a well-behaved extractor that emits `prefix` for `prefix` itself will also emit it for any key starting with `prefix`. Adding `is_valid_scan_boundary()` to the public trait was deferred to issue `#47` to avoid API complexity for a theoretical misuse case. Do not flag the current predicate as insufficient in PR `#43` or future PRs.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to **/*.rs : Before flagging any potential issue, trace the call chain visible in the PR diff. Check the caller to see if it handles the case, verify type-level guarantees, and read adjacent comments about deliberate design decisions.

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/range_tombstone.rs : `src/range_tombstone.rs` — Range tombstone data model and serialization

Learnt from: CR
Repo: structured-world/lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-20T22:03:44.692Z
Learning: Applies to src/active_tombstone_set.rs : `src/active_tombstone_set.rs` — Tracks active range tombstones during compaction

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@polaz polaz merged commit c1e43d5 into main Mar 21, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add structured error context to range tombstone decoding in decode_range_tombstones

2 participants