Skip to content

perf(compaction): merge input ranges before L2 overlap query#146

Merged
polaz merged 4 commits into
mainfrom
fix/#122-fix-multi-level-compaction--relax-disjoint-assert
Mar 24, 2026
Merged

perf(compaction): merge input ranges before L2 overlap query#146
polaz merged 4 commits into
mainfrom
fix/#122-fix-multi-level-compaction--relax-disjoint-assert

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 23, 2026

Summary

Technical Details

Previously, multi-level compaction queried L2 once per input table — O(L2_runs × input_tables × log L2_run_size). With overlapping L0 tables, many queries hit the same L2 regions redundantly.

Now, input key ranges from L0+L1 are sorted and merged into disjoint intervals first, then L2 is queried with the (typically much smaller) set of merged intervals.

Test Plan

  • 8 unit tests for merge_sorted_cmp (empty, single, disjoint, overlapping, adjacent, contained, mixed, reverse comparator)
  • All 21 existing leveled compaction tests pass (including multi-level data integrity tests)
  • Full suite: 490 lib + 33 doc tests pass, zero clippy warnings

Closes #122

Copilot AI review requested due to automatic review settings March 23, 2026 20:29
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Collect L0+L1 input key ranges, sort and coalesce them into disjoint intervals via the new KeyRange::merge_sorted_cmp, then query each L2 run once per merged interval to assemble overlapping L2 tables for multi-level compaction selection.

Changes

Cohort / File(s) Summary
KeyRange merging utility
src/key_range.rs
Add pub(crate) fn merge_sorted_cmp(ranges: impl IntoIterator<Item = KeyRange>, cmp: &dyn crate::comparator::UserComparator) -> Vec<KeyRange> (#[must_use]) that coalesces sorted ranges into disjoint/adjacent intervals. Includes unit tests for empty, single, disjoint, overlapping, adjacent, contained, mixed, and a custom comparator.
Multi-level compaction optimization
src/compaction/leveled/mod.rs
Replace per-input-table L2 overlap queries with: gather input key ranges from target (L1) and first (L0) levels, sort and merge them with KeyRange::merge_sorted_cmp, then call run.get_overlapping_cmp(...) once per merged interval to build L2 table_ids for compaction selection. Core choice/CompactionInput shape unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Strategy as Strategy::choose
    participant KeyRange as KeyRange::merge_sorted_cmp
    participant L2Run as L2 Run(s)
    participant Selector as Compaction Selector

    Caller->>Strategy: request compaction choice (L0→L1→L2)
    Strategy->>Strategy: collect key ranges from L0 and L1
    Strategy->>KeyRange: sort & merge ranges
    KeyRange-->>Strategy: disjoint intervals
    loop for each merged interval
        Strategy->>L2Run: get_overlapping_cmp(interval)
        L2Run-->>Strategy: table_ids
    end
    Strategy->>Selector: assemble Choice::Merge / CompactionInput
    Selector-->>Caller: return compaction choice
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'perf(compaction): merge input ranges before L2 overlap query' clearly summarizes the main change: merging input key ranges before L2 overlap queries in compaction.
Description check ✅ Passed The description thoroughly explains the PR objectives, technical details, and test plan, directly relating to the code changes in both files.
Linked Issues check ✅ Passed The PR successfully implements Part 2 of issue #122 by adding KeyRange::merge_sorted_cmp() and optimizing L2 overlap queries in multi-level compaction, exactly as required.
Out of Scope Changes check ✅ Passed All changes are in scope: merge_sorted_cmp implementation in key_range.rs and optimization in leveled/mod.rs directly address Part 2 of #122 with no unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/#122-fix-multi-level-compaction--relax-disjoint-assert

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 9f4f0cf Previous: 471fffd Ratio
readrandom 528432.9910175243 ops/sec 624353.0044436295 ops/sec 1.18
seekrandom 356991.9712987594 ops/sec 416320.2526557688 ops/sec 1.17
overwrite 960089.119312411 ops/sec 1238960.6668705791 ops/sec 1.29
readwhilewriting 457560.11536325916 ops/sec 526228.9063116263 ops/sec 1.15

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves multi-level compaction performance by reducing redundant L2 overlap lookups when L0/L1 input tables overlap, while preserving comparator-aware ordering semantics.

Changes:

  • Add KeyRange::merge_sorted_cmp() to coalesce comparator-sorted key ranges into disjoint (touching/overlapping) intervals.
  • Update leveled compaction multi-level path (L0+L1 → L2) to merge input ranges first, then query L2 overlaps per merged interval.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/key_range.rs Adds comparator-aware merging of sorted KeyRanges plus unit tests covering core interval cases and reverse comparator behavior.
src/compaction/leveled/mod.rs Replaces per-input-table L2 overlap probing with merged-interval probing to cut redundant overlap searches.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@polaz polaz force-pushed the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch from ae55fcb to d9ff760 Compare March 23, 2026 21:49
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz Sure! I'll re-review the changes in this PR right away.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/key_range.rs`:
- Around line 157-160: Add the #[must_use] attribute to the public helper
function KeyRange::merge_sorted_cmp to prevent callers from accidentally
discarding its returned Vec<Self>; locate the pub fn merge_sorted_cmp(...)
signature and place #[must_use] immediately above it so the compiler warns when
the result is ignored.
- Around line 432-537: The tests inside mod merge_sorted_cmp should be renamed
to follow the repository convention fn <what>_<condition>_<expected>(); update
each test function identifier (e.g., empty_input, single_range,
disjoint_ranges_stay_separate, overlapping_ranges_merge, adjacent_ranges_merge,
contained_range_absorbed, mixed_disjoint_and_overlapping, reverse_comparator) to
descriptive names like merge_sorted_cmp_empty_input_returns_empty,
merge_sorted_cmp_single_range_returns_same,
merge_sorted_cmp_disjoint_ranges_remain_separate,
merge_sorted_cmp_overlapping_ranges_merge_into_one,
merge_sorted_cmp_adjacent_ranges_merge,
merge_sorted_cmp_contained_range_absorbed,
merge_sorted_cmp_mixed_disjoint_and_overlapping_results, and
merge_sorted_cmp_reverse_comparator_returns_coalesced_ranges respectively; keep
the test bodies and assertions unchanged and only change the fn names so
references to KeyRange::merge_sorted_cmp and DefaultUserComparator/ReverseCmp
still match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f04ff92-6f89-4daf-8ffb-6d960db2d733

📥 Commits

Reviewing files that changed from the base of the PR and between 600baee and d9ff760.

📒 Files selected for processing (2)
  • src/compaction/leveled/mod.rs
  • src/key_range.rs

Comment thread src/key_range.rs Outdated
Comment thread src/key_range.rs
@polaz polaz requested a review from Copilot March 23, 2026 22:49
@polaz polaz force-pushed the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch from f82e1d0 to eefaca8 Compare March 23, 2026 22:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/key_range.rs:467

  • New test function names (e.g., empty_input, single_range, disjoint_ranges_stay_separate) don't match the repository's test naming convention fn <what>_<condition>_<expected>(). Renaming these tests to follow that pattern will make failures and cargo test <name> searches more consistent across the codebase.
            let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
            let bounds = (Unbounded, Included(b"key5" as &[u8]));
            assert!(key_range.overlaps_with_bounds(&bounds));
        }

        #[test]
        fn semi_open_5() {
            let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
            let bounds = (Unbounded, Included(b"key6" as &[u8]));
            assert!(key_range.overlaps_with_bounds(&bounds));
        }

        #[test]
        fn semi_open_6() {
            let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
            let bounds = (Included(b"key0" as &[u8]), Unbounded);
            assert!(key_range.overlaps_with_bounds(&bounds));
        }

        #[test]
        fn semi_open_7() {
            let key_range = KeyRange(UserKey::from("key5"), UserKey::from("key8"));
            let bounds = (Unbounded, Excluded(b"key6" as &[u8]));
            assert!(key_range.overlaps_with_bounds(&bounds));
        }
    }

    mod overlaps_with_bounds_cmp {
        use super::*;
        use crate::comparator::UserComparator;

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/key_range.rs`:
- Around line 205-225: The function merge_sorted_cmp currently assumes its input
iterator is sorted and will silently produce incorrect merges if it's not;
change its visibility from pub to pub(crate) and add a debug_assert that
validates the monotonic order of incoming ranges by checking that each r.min()
is >= previous last.max() according to the provided comparator (use
cmp.compare(last.max(), r.min()) == Ordering::Less or equal as appropriate)
before merging inside merge_sorted_cmp so unsorted callers fail fast in tests;
keep the comparator-based comparisons (cmp.compare) in the assertion to match
the function's ordering semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e975353-9024-4f65-99df-c1932663e64f

📥 Commits

Reviewing files that changed from the base of the PR and between f82e1d0 and eefaca8.

📒 Files selected for processing (2)
  • src/compaction/leveled/mod.rs
  • src/key_range.rs

Comment thread src/key_range.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/key_range.rs Outdated
@polaz polaz force-pushed the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch from 3ff9199 to 2c595a0 Compare March 23, 2026 23:28
@polaz polaz requested a review from Copilot March 23, 2026 23:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@polaz polaz force-pushed the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch from 2c595a0 to 9f4f0cf Compare March 24, 2026 00:12
polaz added 4 commits March 24, 2026 02:22
- Add KeyRange::merge_sorted_cmp() to coalesce sorted ranges into
  disjoint intervals using a custom comparator
- Replace per-table L2 overlap queries in multi-level compaction with
  merged-interval queries, reducing redundant binary searches when
  L0 tables overlap
- Add unit tests for merge_sorted_cmp (empty, single, disjoint,
  overlapping, adjacent, contained, mixed, reverse comparator)

Closes #122
@polaz polaz force-pushed the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch from 9f4f0cf to 03c3070 Compare March 24, 2026 00:23
@polaz polaz requested a review from Copilot March 24, 2026 00:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 19a4ebb into main Mar 24, 2026
19 checks passed
@polaz polaz deleted the fix/#122-fix-multi-level-compaction--relax-disjoint-assert branch March 24, 2026 01:03
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 24, 2026
polaz pushed a commit that referenced this pull request Mar 24, 2026
## 🤖 New release

* `coordinode-lsm-tree`: 4.0.0 -> 4.1.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[4.1.0](v4.0.0...v4.1.0)
- 2026-03-24

### Added

- *(fs)* io_uring Fs implementation for high-throughput I/O
([#106](#106))
- *(compression)* zstd dictionary compression support
([#131](#131))

### Documentation

- add benchmark dashboard link and update badges
([#151](#151))
- add v4.0.0 fork epoch changelog (all changes since upstream v3.1.1)

### Fixed

- *(version)* fsync version file before rewriting CURRENT pointer
([#152](#152))
- thread UserComparator through ingestion guards and range overlap
([#139](#139))

### Performance

- *(bench)* add multi-threaded support to all db_bench workloads
([#155](#155))
- *(merge)* replace IntervalHeap with sorted-vec heap +
replace_min/replace_max
([#148](#148))
- *(compaction)* merge input ranges before L2 overlap query
([#146](#146))

### Refactored

- *(version)* comparator API cleanup — TransformContext + rename
Run::push()
([#153](#153))
- add #[non_exhaustive] to CompressionType enum
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: multi-level compaction — relax disjoint assert + merge input ranges optimization

2 participants