Skip to content

fix: thread UserComparator through Run, KeyRange, and Version methods#117

Merged
polaz merged 24 commits into
mainfrom
feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin
Mar 23, 2026
Merged

fix: thread UserComparator through Run, KeyRange, and Version methods#117
polaz merged 24 commits into
mainfrom
feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 22, 2026

Summary

Extends comparator-aware coverage (#98 core fix landed in #100) to remaining code paths, plus fixes #122.

  • Leveled compaction choose() — all overlap detection, key range aggregation, trivial move decisions now use comparator
  • pick_minimal_compaction multi-run aware (fix: multi-level compaction — relax disjoint assert + merge input ranges optimization #122) — accepts &Level instead of &Run, scans all runs for overlap/containment. Eliminates missed tables in transient multi-run levels from multi-level compaction (feat(compaction): compute L2 overlaps per-range in multi-level path #108)
  • RunReader::new_cmp — comparator-aware table selection for range scans (create_range + create_range_point)
  • OwnedBounds::contains — comparator-aware containment for drop_range strategy
  • get_contained_cmp — comparator-aware table containment in runs
  • Level::aggregate_key_range_cmp + KeyRange::aggregate_cmp + KeyRange::contains_range_cmp — cross-run aggregation with comparator

What #100 covered vs what this PR adds

Area #100 This PR
Run::push_cmp, get_overlapping_cmp, range_overlap_indexes_cmp Done
optimize_runs + Version::with_* comparator threading Done
Leveled choose() comparator threading Done
pick_minimal_compaction multi-run aware (#122) Done
RunReader::new_cmp for range scans Done
OwnedBounds::contains with comparator Done
get_contained_cmp, contains_range_cmp, aggregate_cmp Done
Level::aggregate_key_range_cmp Done
RunReader::new public API preservation Done
trim_slice deduplication Done

Test Plan

Closes #122

Related

Copilot AI review requested due to automatic review settings March 22, 2026 22:33
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40345c90-99e3-4dc3-8319-1b842a3c72b5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Threads a UserComparator through key-range, run, reader, iteration, and compaction codepaths; adds comparator-aware APIs (_cmp) across KeyRange, Run, RunReader, Level/Version, and compaction Strategy::choose; refactors leveled picker to consider all runs in a level; adds regression tests for a ReverseComparator.

Changes

Cohort / File(s) Summary
KeyRange APIs
src/key_range.rs
Add comparator-aware methods: contains_range_cmp and aggregate_cmp for containment and extrema using UserComparator.
Run / Containment
src/version/run.rs
Extract trim_slice helper; add Run::get_contained_cmp(...) using comparator-aware overlap/containment; extend tests (new reverse-comparator cases).
Level / Version
src/version/mod.rs
Add Level::aggregate_key_range_cmp(...) to compute per-level key-range extrema with a comparator.
RunReader & Iteration
src/run_reader.rs, src/range.rs
Introduce RunReader::new_cmp(...); make RunReader::new forward to new_cmp (default comparator); update TreeIter::create_range[_point] to construct readers with the active comparator.
Compaction: drop_range
src/compaction/drop_range.rs
OwnedBounds::contains now takes cmp; Strategy::choose uses config to obtain cmp and switches to comparator-aware overlap/index APIs.
Compaction: leveled picker
src/compaction/leveled/mod.rs
Refactor pick_minimal_compaction to operate on entire Level references, thread cmp throughout, use comparator-aware overlap/containment APIs, and remove strict disjoint debug assertions.
Tests
tests/custom_comparator.rs, tests/custom_comparator_compaction.rs
Add comprehensive regression tests for ReverseComparator behavior across reopen, compaction, merge, and tombstone propagation; adjust two test ignore messages.

Sequence Diagram(s)

sequenceDiagram
  participant Client as "Strategy::choose"
  participant Version as "Version / Level / Run"
  participant RunReader as "RunReader::new_cmp"
  participant KeyRange as "KeyRange::*_cmp"

  Client->>Version: Level::aggregate_key_range_cmp(cmp)
  Client->>Version: run.range_overlap_indexes_cmp(bounds, cmp)
  Client->>RunReader: RunReader::new_cmp(run, range, cmp)
  RunReader->>Version: run.range_overlap_indexes_cmp(range, cmp)
  RunReader->>KeyRange: key_range.contains_range_cmp(..., cmp)
  KeyRange-->>RunReader: containment result
  RunReader-->>Client: candidate readers / tables
  Client-->>Client: build table_ids, apply hidden-table checks, return Choice
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: threading the UserComparator through Run, KeyRange, and Version methods across multiple modules.
Description check ✅ Passed The description comprehensively explains the changes made across modules, relates them to previous PRs (#100), and provides clear context for the work done.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from #122: Part 1 (relaxed debug assertions), Part 3 (multi-run aware pick_minimal_compaction accepting &Level), and comprehensive comparator threading throughout the codebase.
Out of Scope Changes check ✅ Passed All changes are directly scoped to threading UserComparator through the specified modules and improving multi-level compaction; no unrelated modifications were introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from 54f896c to 66cfb53 Compare March 22, 2026 22:36
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/compaction/drop_range.rs (1)

82-86: ⚠️ Potential issue | 🟠 Major

Use comparator-aware containment check to match overlap logic.

Line 82 correctly uses range_overlap_indexes_cmp_bounds(&self.bounds, cmp), but line 86 filters with self.bounds.contains(x.key_range()), which relies on lexicographic byte comparison only. After getting tables that overlap in comparator order, filtering them by lexicographic containment breaks correctness for custom comparators like ReverseComparator.

The codebase has established the pattern (see src/version/run.rs:233): after range_overlap_indexes_cmp_bounds(), follow with contains_range_cmp() on the KeyRange. Add a contains_cmp method to OwnedBounds that accepts the comparator:

pub fn contains_cmp(&self, range: &KeyRange, cmp: &dyn crate::comparator::UserComparator) -> bool {
    // Implement using cmp.compare() similar to KeyRange::contains_range_cmp
}

Then update line 86 to: .filter(|x| self.bounds.contains_cmp(x.key_range(), cmp))

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compaction/drop_range.rs` around lines 82 - 86, The filter uses
lexicographic contains but the overlap was computed with a comparator; add a
comparator-aware containment to OwnedBounds and use it: implement
OwnedBounds::contains_cmp(&self, range: &KeyRange, cmp: &dyn
crate::comparator::UserComparator) (mirroring KeyRange::contains_range_cmp and
using cmp.compare()), then replace the current .filter(|x|
self.bounds.contains(x.key_range())) with .filter(|x|
self.bounds.contains_cmp(x.key_range(), cmp)) so containment uses the same
comparator as range_overlap_indexes_cmp_bounds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/blob_tree/ingest.rs`:
- Around line 254-259: The BlobIngestion write guards currently check ordering
with native byte Ord (e.g., `key > *prev`) which breaks custom comparators;
update all ordering checks inside BlobIngestion (the guards at the spots
referenced around Line 76, 116, 137) to use the index comparator instead (access
`index.config.comparator.as_ref()` or accept the comparator where BlobIngestion
is constructed) and replace the `>`/`<`/`==` checks with comparator-based
comparisons (e.g., call the comparator's compare/compare_bytes method and test
for Ordering::Greater/Equal/etc.) so validation uses the configured comparator
semantics.

In `@src/tree/ingest.rs`:
- Around line 325-330: The guarded write-path comparisons currently use plain
lexicographic operators like `key > *prev`, which is incorrect for custom
comparators; update each guard (the checks at the sites referenced near the
`with_new_l0_run` usage where `prev` and `key` are compared) to use the
configured comparator from `self.tree.config.comparator` instead of `>`: call
the comparator (via its `as_ref()` or its compare method) to compare `prev` and
`key` and interpret the returned Ordering to enforce monotonicity consistent
with the comparator (replace `key > *prev` semantics with a comparator-based
ordering test). Ensure this change is applied at all four guard sites so
ingestion ordering matches `with_new_l0_run`’s comparator.

---

Outside diff comments:
In `@src/compaction/drop_range.rs`:
- Around line 82-86: The filter uses lexicographic contains but the overlap was
computed with a comparator; add a comparator-aware containment to OwnedBounds
and use it: implement OwnedBounds::contains_cmp(&self, range: &KeyRange, cmp:
&dyn crate::comparator::UserComparator) (mirroring KeyRange::contains_range_cmp
and using cmp.compare()), then replace the current .filter(|x|
self.bounds.contains(x.key_range())) with .filter(|x|
self.bounds.contains_cmp(x.key_range(), cmp)) so containment uses the same
comparator as range_overlap_indexes_cmp_bounds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64855ab2-089b-4a47-9a5c-85ac53af537b

📥 Commits

Reviewing files that changed from the base of the PR and between 7c3fa37 and 54f896c.

📒 Files selected for processing (14)
  • src/blob_tree/ingest.rs
  • src/compaction/drop_range.rs
  • src/compaction/flavour.rs
  • src/compaction/leveled/mod.rs
  • src/compaction/worker.rs
  • src/key_range.rs
  • src/range.rs
  • src/run_reader.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/version/mod.rs
  • src/version/optimize.rs
  • src/version/run.rs
  • tests/custom_comparator.rs

Comment thread src/blob_tree/ingest.rs
Comment thread src/tree/ingest.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect table ordering and range selection when a non-lexicographic UserComparator is configured by introducing comparator-aware variants across Run/KeyRange/Level/RunReader and threading the comparator through Version mutation and compaction/range-scan call sites.

Changes:

  • Add _cmp variants for sorting/searching/overlap checks that use UserComparator instead of bytewise lexicographic order.
  • Thread SharedComparator through optimize_runs, Version::with_* methods, compaction strategies, and range scan construction (RunReader::new_cmp).
  • Add regression tests using ReverseComparator to cover compaction, leveled compaction, merges, and tombstones.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/custom_comparator.rs Adds regression tests ensuring iteration/compaction behavior respects custom comparator ordering.
src/version/run.rs Introduces comparator-aware run insertion and overlap/containment index selection helpers.
src/version/optimize.rs Makes run optimization comparator-aware and updates unit tests accordingly.
src/version/mod.rs Threads comparator into Version mutations and adds Level::aggregate_key_range_cmp.
src/tree/mod.rs Passes configured comparator into Version updates during runtime operations.
src/tree/ingest.rs Passes comparator into ingestion path when creating new L0 runs.
src/run_reader.rs Adds RunReader::new_cmp to cull tables for range scans using comparator-aware overlap logic.
src/range.rs Updates range iteration to use comparator-aware RunReader construction.
src/key_range.rs Adds comparator-aware key-range operations (contains/overlaps/aggregate).
src/compaction/worker.rs Threads comparator into with_moved and with_dropped version updates.
src/compaction/leveled/mod.rs Uses comparator-aware range aggregation and overlap detection in leveled strategy decisions.
src/compaction/flavour.rs Threads comparator into version-building during compaction application.
src/compaction/drop_range.rs Uses comparator-aware overlap selection for dropping ranges.
src/blob_tree/ingest.rs Passes comparator into blob ingestion path when creating new L0 runs.

Comment thread src/version/mod.rs
Comment thread tests/custom_comparator.rs Outdated
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 22, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

@polaz: Sure! I'll re-review the changes in PR #117 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:10.672Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 117
File: src/tree/ingest.rs:325-330
Timestamp: 2026-03-22T22:44:14.193Z
Learning: In `structured-world/lsm-tree` (`src/tree/ingest.rs`), the write-path monotonicity guards in `Ingestion::write`, `write_tombstone`, `write_weak_tombstone`, and `write_indirection` use plain lexicographic `key > *prev` comparisons instead of the configured `UserComparator`. This is a pre-existing issue acknowledged as valid but intentionally out of scope for PR `#98` (issue `#98` covers `Run::push()` and related methods). It is tracked in a separate follow-up issue. Do not flag the lexicographic ordering guards in `src/tree/ingest.rs` ingestion write paths as a concern in PR `#98` or future PRs until that separate issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/comparator.rs:7-33
Timestamp: 2026-03-22T11:14:39.924Z
Learning: In `structured-world/lsm-tree` (`src/comparator.rs`), the `UserComparator` trait introduced in PR `#67` intentionally has no stable persisted identity (no `name()` method). Comparator identity persistence and mismatch detection on tree reopen is deferred to issue `#74`. The current design follows RocksDB's approach of placing responsibility on the caller (documented in `src/comparator.rs` lines 31–33 and in `Config::comparator` doc). Do not flag the absence of `UserComparator::name()` or comparator identity persistence as a blocking concern in PR `#67` or future PRs until issue `#74` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:30:38.753Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait (and all custom-comparator support) was introduced in PR `#67` within the same release cycle as the comparator-name persistence added in PR `#101`. All trees created before PR `#67` used only `DefaultUserComparator` (whose `name()` returns `"default"`). Manifests without a `comparator_name` section default to `"default"` in `Manifest::decode_from` (src/manifest.rs). Therefore, there are no legacy trees with a custom comparator and no upgrade path is needed — do not flag the missing-section fallback to `"default"` as a legacy-compatibility risk in PR `#101` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:38:37.516Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait was introduced in PR `#67` in the same release cycle as comparator-name persistence (PR `#101`, closes issue `#74`). All trees created before PR `#101` used `DefaultUserComparator` (no custom-comparator tree could have existed without the trait). Therefore, the manifest fallback of `"default"` for a missing `comparator_name` section is safe: it matches every pre-existing tree exactly, and no legacy custom-comparator trees exist that would be incorrectly rejected on reopen. Do not flag the missing-section → `"default"` fallback as an unsafe upgrade path.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comment thread src/version/run.rs Outdated
Comment thread src/version/mod.rs
Comment thread src/version/optimize.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 90.06211% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/compaction/leveled/mod.rs 88.73% 8 Missing ⚠️
src/range.rs 28.57% 5 Missing ⚠️
src/version/run.rs 88.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@polaz polaz requested a review from Copilot March 23, 2026 00:32
@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from a11eb34 to efd0bc2 Compare March 23, 2026 00:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz: Sure! I'll re-review the changes in PR #117 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:10.672Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 117
File: src/tree/ingest.rs:325-330
Timestamp: 2026-03-22T22:44:14.193Z
Learning: In `structured-world/lsm-tree` (`src/tree/ingest.rs`), the write-path monotonicity guards in `Ingestion::write`, `write_tombstone`, `write_weak_tombstone`, and `write_indirection` use plain lexicographic `key > *prev` comparisons instead of the configured `UserComparator`. This is a pre-existing issue acknowledged as valid but intentionally out of scope for PR `#98` (issue `#98` covers `Run::push()` and related methods). It is tracked in a separate follow-up issue. Do not flag the lexicographic ordering guards in `src/tree/ingest.rs` ingestion write paths as a concern in PR `#98` or future PRs until that separate issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 108
File: src/compaction/leveled/test.rs:681-727
Timestamp: 2026-03-23T00:40:16.100Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/test.rs`), the regression test `multi_level_sparse_keyspace_data_integrity` (added in PR `#108` for issue `#72`) uses full-span SSTs (writes both "a" and "z" in each flush) rather than genuinely disjoint key ranges. This is a known, accepted coverage gap: at unit-test data volumes the force-trivial-move scoring (99.99) cascades all intermediate levels to Lmax, making it structurally impossible to simultaneously populate L1 and L2 in a unit test. The test verifies data integrity (read-back correctness) only, not the per-range L2 sparse-overlap optimization. Do not flag the full-span flush pattern in this test as a deficiency in PR `#108` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/comparator.rs:7-33
Timestamp: 2026-03-22T11:14:39.924Z
Learning: In `structured-world/lsm-tree` (`src/comparator.rs`), the `UserComparator` trait introduced in PR `#67` intentionally has no stable persisted identity (no `name()` method). Comparator identity persistence and mismatch detection on tree reopen is deferred to issue `#74`. The current design follows RocksDB's approach of placing responsibility on the caller (documented in `src/comparator.rs` lines 31–33 and in `Config::comparator` doc). Do not flag the absence of `UserComparator::name()` or comparator identity persistence as a blocking concern in PR `#67` or future PRs until issue `#74` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:30:38.753Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait (and all custom-comparator support) was introduced in PR `#67` within the same release cycle as the comparator-name persistence added in PR `#101`. All trees created before PR `#67` used only `DefaultUserComparator` (whose `name()` returns `"default"`). Manifests without a `comparator_name` section default to `"default"` in `Manifest::decode_from` (src/manifest.rs). Therefore, there are no legacy trees with a custom comparator and no upgrade path is needed — do not flag the missing-section fallback to `"default"` as a legacy-compatibility risk in PR `#101` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:38:37.516Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait was introduced in PR `#67` in the same release cycle as comparator-name persistence (PR `#101`, closes issue `#74`). All trees created before PR `#101` used `DefaultUserComparator` (no custom-comparator tree could have existed without the trait). Therefore, the manifest fallback of `"default"` for a missing `comparator_name` section is safe: it matches every pre-existing tree exactly, and no legacy custom-comparator trees exist that would be incorrectly rejected on reopen. Do not flag the missing-section → `"default"` fallback as an unsafe upgrade path.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T00:44:56.025Z
Learning: In `structured-world/lsm-tree`, issue `#83` has been resolved in PR `#102` (`feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part`). `Table::bloom_may_contain_key(key, key_hash)` was added to `src/table/mod.rs`, which seeks the partitioned filter TLI by user key and queries only the matching partition's bloom filter, returning `Ok(false)` for keys beyond all partition boundaries. `IterState` in `src/range.rs` gained `bloom_key: Option<UserKey>`; `bloom_passes()` dispatches to `bloom_may_contain_key` when `bloom_key` is set (single-key pipeline path) and falls back to `bloom_may_contain_key_hash` otherwise (prefix/range scans). The prior conservative `Ok(true)` fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters is preserved for callers without a key. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in any future PR — it is intentional for callers that lack a user key.
✅ Actions performed

Full review triggered.

@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from efd0bc2 to 47aac04 Compare March 23, 2026 01:40
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 9dfceab Previous: 3b54ecb Ratio
fillrandom 820414.8419444144 ops/sec 1205121.0074657367 ops/sec 1.47
readrandom 466762.9383897613 ops/sec 614429.2119898967 ops/sec 1.32
seekrandom 332287.8196148191 ops/sec 402284.5910887458 ops/sec 1.21
overwrite 913550.8727567153 ops/sec 1146265.159478525 ops/sec 1.25
readwhilewriting 388242.9573256522 ops/sec 525608.465712753 ops/sec 1.35

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

@polaz polaz requested a review from Copilot March 23, 2026 08:38
@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from 47aac04 to 2c86f4d Compare March 23, 2026 08:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz: Sure! I'll re-review the changes in PR #117 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T00:44:56.025Z
Learning: In `structured-world/lsm-tree`, issue `#83` has been resolved in PR `#102` (`feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part`). `Table::bloom_may_contain_key(key, key_hash)` was added to `src/table/mod.rs`, which seeks the partitioned filter TLI by user key and queries only the matching partition's bloom filter, returning `Ok(false)` for keys beyond all partition boundaries. `IterState` in `src/range.rs` gained `bloom_key: Option<UserKey>`; `bloom_passes()` dispatches to `bloom_may_contain_key` when `bloom_key` is set (single-key pipeline path) and falls back to `bloom_may_contain_key_hash` otherwise (prefix/range scans). The prior conservative `Ok(true)` fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters is preserved for callers without a key. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in any future PR — it is intentional for callers that lack a user key.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:10.672Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 117
File: src/tree/ingest.rs:325-330
Timestamp: 2026-03-22T22:44:14.193Z
Learning: In `structured-world/lsm-tree` (`src/tree/ingest.rs`), the write-path monotonicity guards in `Ingestion::write`, `write_tombstone`, `write_weak_tombstone`, and `write_indirection` use plain lexicographic `key > *prev` comparisons instead of the configured `UserComparator`. This is a pre-existing issue acknowledged as valid but intentionally out of scope for PR `#98` (issue `#98` covers `Run::push()` and related methods). It is tracked in a separate follow-up issue. Do not flag the lexicographic ordering guards in `src/tree/ingest.rs` ingestion write paths as a concern in PR `#98` or future PRs until that separate issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 108
File: src/compaction/leveled/test.rs:681-727
Timestamp: 2026-03-23T00:40:16.100Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/test.rs`), the regression test `multi_level_sparse_keyspace_data_integrity` (added in PR `#108` for issue `#72`) uses full-span SSTs (writes both "a" and "z" in each flush) rather than genuinely disjoint key ranges. This is a known, accepted coverage gap: at unit-test data volumes the force-trivial-move scoring (99.99) cascades all intermediate levels to Lmax, making it structurally impossible to simultaneously populate L1 and L2 in a unit test. The test verifies data integrity (read-back correctness) only, not the per-range L2 sparse-overlap optimization. Do not flag the full-span flush pattern in this test as a deficiency in PR `#108` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/comparator.rs:7-33
Timestamp: 2026-03-22T11:14:39.924Z
Learning: In `structured-world/lsm-tree` (`src/comparator.rs`), the `UserComparator` trait introduced in PR `#67` intentionally has no stable persisted identity (no `name()` method). Comparator identity persistence and mismatch detection on tree reopen is deferred to issue `#74`. The current design follows RocksDB's approach of placing responsibility on the caller (documented in `src/comparator.rs` lines 31–33 and in `Config::comparator` doc). Do not flag the absence of `UserComparator::name()` or comparator identity persistence as a blocking concern in PR `#67` or future PRs until issue `#74` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:30:38.753Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait (and all custom-comparator support) was introduced in PR `#67` within the same release cycle as the comparator-name persistence added in PR `#101`. All trees created before PR `#67` used only `DefaultUserComparator` (whose `name()` returns `"default"`). Manifests without a `comparator_name` section default to `"default"` in `Manifest::decode_from` (src/manifest.rs). Therefore, there are no legacy trees with a custom comparator and no upgrade path is needed — do not flag the missing-section fallback to `"default"` as a legacy-compatibility risk in PR `#101` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 101
File: src/tree/mod.rs:1370-1381
Timestamp: 2026-03-22T20:38:37.516Z
Learning: In `structured-world/lsm-tree`, the `UserComparator` trait was introduced in PR `#67` in the same release cycle as comparator-name persistence (PR `#101`, closes issue `#74`). All trees created before PR `#101` used `DefaultUserComparator` (no custom-comparator tree could have existed without the trait). Therefore, the manifest fallback of `"default"` for a missing `comparator_name` section is safe: it matches every pre-existing tree exactly, and no legacy custom-comparator trees exist that would be incorrectly rejected on reopen. Do not flag the missing-section → `"default"` fallback as an unsafe upgrade path.
✅ Actions performed

Full review triggered.

@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from 2c86f4d to ce54e3e Compare March 23, 2026 10:21
@polaz polaz requested a review from Copilot March 23, 2026 10:24
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

polaz added 16 commits March 23, 2026 20:34
pick_minimal_compaction operates on first_run() only. If a level
has two runs (transient state from multi-level compaction #108),
the second run would be missed. Return DoNothing and let the next
compaction pass heal the multi-run state first.

Replaces the relaxed debug_assert (run_count <= 2) with a runtime
guard that avoids the problematic code path entirely.
Runtime DoNothing guard can stall compaction if multi-run state
persists. Revert to relaxed debug_assert (run_count <= 2) —
pick_minimal_compaction with first_run() is suboptimal but not
incorrect for transient multi-run levels, and compaction still
makes forward progress.
Tests exercise RunReader::new so the lint doesn't fire in test
builds. Unconditional #[expect(dead_code)] would trigger
unused-expect warning with deny(unused).
- Use crate::comparator::UserComparator in doc link to avoid
  broken intra-doc reference
- Change reason to "crate-internal API" since run_reader is a
  private module
Deduplicate identical trim_slice inner functions in get_contained
and get_contained_cmp into a single module-level helper.
…ect message

- key_range contains_key doc: list contains_range_cmp as existing
- leveled first_run expect: "at least one run" not "exactly one"
…run levels

debug_assert with a hard run_count limit can panic in debug builds
for valid transient states. Replace with log::debug since multi-run
L1+ is a performance concern, not a correctness issue.
The manifest round-trips table order within each run, so recovered
runs are already in comparator-sorted order. No re-sort needed.
RunReader comparator plumbing is done (new_cmp), but range bounds
interpretation for reverse comparator remains unresolved (#116).
Update ignore annotations to reflect the actual blocker.
Also update ignore annotations on range scan tests to reflect
the actual blocker (#116 range bounds interpretation).
Accept &Level instead of &Run<Table> so the picker scans ALL runs
in both levels. Trivial move checks overlap across all next-level
runs; merge pull-in collects contained tables from all curr-level
runs. Eliminates missed tables in transient multi-run levels from
multi-level compaction (#108).

Closes #122
Production SSTs store (comparator_min, comparator_max). With reverse
comparator "z" < "p", so key range is (z,p) not (p,z).
take_while after flat_map kills the entire iterator when one run's
window exceeds the size cap. Move inside flat_map so each run's
windows are capped independently.
Exercises RunReader::new_cmp path in create_range — needs multiple
SSTs in a single L1 run, which only happens after leveled compaction.
Covers both full and bounded range scans.
@polaz polaz force-pushed the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch from 1f8fe4e to 9dfceab Compare March 23, 2026 18:34
@polaz polaz requested review from Copilot and removed request for Copilot March 23, 2026 18:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

tests/custom_comparator.rs:1

  • flush_active_memtable is used elsewhere in this file with a monotonically increasing seqno that is >= the writes being flushed (e.g., flush_active_memtable(2) after seqnos 0/1). Here it is repeatedly called with 0 while inserts use seqnos 10..=90, which risks producing incorrect SST metadata or MVCC visibility assumptions and can make the test flaky/non-representative. Fix: pass a monotonically increasing flush seqno (e.g., key + 1, or a local counter) that is >= the max seqno written to the memtable being flushed.
use lsm_tree::{AbstractTree, Config, Guard as _, SharedComparator, UserComparator};

Comment thread src/compaction/leveled/mod.rs
Comment thread src/compaction/leveled/mod.rs
Takes first/last from already-sorted slice — no key comparison,
works correctly for any comparator.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 7586739 into main Mar 23, 2026
19 of 20 checks passed
@polaz polaz deleted the feat/#98-bug-runpush-sorts-tables-lexicographically-ignorin branch March 23, 2026 19:04
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 23, 2026
polaz added a commit that referenced this pull request Mar 24, 2026
## Summary

- Add `KeyRange::merge_sorted_cmp()` to coalesce sorted key ranges into
disjoint intervals using a custom comparator
- Replace per-table L2 overlap queries in multi-level compaction with
merged-interval queries, reducing redundant binary searches when L0
tables overlap
- Parts 1 and 3 of #122 were already completed in #117; this PR
implements Part 2 (merge input ranges optimization)

## Technical Details

Previously, multi-level compaction queried L2 once per input table —
O(L2_runs × input_tables × log L2_run_size). With overlapping L0 tables,
many queries hit the same L2 regions redundantly.

Now, input key ranges from L0+L1 are sorted and merged into disjoint
intervals first, then L2 is queried with the (typically much smaller)
set of merged intervals.

## Test Plan

- 8 unit tests for `merge_sorted_cmp` (empty, single, disjoint,
overlapping, adjacent, contained, mixed, reverse comparator)
- All 21 existing leveled compaction tests pass (including multi-level
data integrity tests)
- Full suite: 490 lib + 33 doc tests pass, zero clippy warnings

Closes #122
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: multi-level compaction — relax disjoint assert + merge input ranges optimization

2 participants