perf(compaction): merge input ranges before L2 overlap query#146
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCollect L0+L1 input key ranges, sort and coalesce them into disjoint intervals via the new Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Strategy as Strategy::choose
participant KeyRange as KeyRange::merge_sorted_cmp
participant L2Run as L2 Run(s)
participant Selector as Compaction Selector
Caller->>Strategy: request compaction choice (L0→L1→L2)
Strategy->>Strategy: collect key ranges from L0 and L1
Strategy->>KeyRange: sort & merge ranges
KeyRange-->>Strategy: disjoint intervals
loop for each merged interval
Strategy->>L2Run: get_overlapping_cmp(interval)
L2Run-->>Strategy: table_ids
end
Strategy->>Selector: assemble Choice::Merge / CompactionInput
Selector-->>Caller: return compaction choice
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 9f4f0cf | Previous: 471fffd | Ratio |
|---|---|---|---|
readrandom |
528432.9910175243 ops/sec |
624353.0044436295 ops/sec |
1.18 |
seekrandom |
356991.9712987594 ops/sec |
416320.2526557688 ops/sec |
1.17 |
overwrite |
960089.119312411 ops/sec |
1238960.6668705791 ops/sec |
1.29 |
readwhilewriting |
457560.11536325916 ops/sec |
526228.9063116263 ops/sec |
1.15 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
There was a problem hiding this comment.
Pull request overview
This PR improves multi-level compaction performance by reducing redundant L2 overlap lookups when L0/L1 input tables overlap, while preserving comparator-aware ordering semantics.
Changes:
- Add
KeyRange::merge_sorted_cmp()to coalesce comparator-sorted key ranges into disjoint (touching/overlapping) intervals. - Update leveled compaction multi-level path (L0+L1 → L2) to merge input ranges first, then query L2 overlaps per merged interval.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/key_range.rs |
Adds comparator-aware merging of sorted KeyRanges plus unit tests covering core interval cases and reverse comparator behavior. |
src/compaction/leveled/mod.rs |
Replaces per-input-table L2 overlap probing with merged-interval probing to cut redundant overlap searches. |
|
@coderabbitai re-review |
ae55fcb to
d9ff760
Compare
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/key_range.rs`:
- Around line 157-160: Add the #[must_use] attribute to the public helper
function KeyRange::merge_sorted_cmp to prevent callers from accidentally
discarding its returned Vec<Self>; locate the pub fn merge_sorted_cmp(...)
signature and place #[must_use] immediately above it so the compiler warns when
the result is ignored.
- Around line 432-537: The tests inside mod merge_sorted_cmp should be renamed
to follow the repository convention fn <what>_<condition>_<expected>(); update
each test function identifier (e.g., empty_input, single_range,
disjoint_ranges_stay_separate, overlapping_ranges_merge, adjacent_ranges_merge,
contained_range_absorbed, mixed_disjoint_and_overlapping, reverse_comparator) to
descriptive names like merge_sorted_cmp_empty_input_returns_empty,
merge_sorted_cmp_single_range_returns_same,
merge_sorted_cmp_disjoint_ranges_remain_separate,
merge_sorted_cmp_overlapping_ranges_merge_into_one,
merge_sorted_cmp_adjacent_ranges_merge,
merge_sorted_cmp_contained_range_absorbed,
merge_sorted_cmp_mixed_disjoint_and_overlapping_results, and
merge_sorted_cmp_reverse_comparator_returns_coalesced_ranges respectively; keep
the test bodies and assertions unchanged and only change the fn names so
references to KeyRange::merge_sorted_cmp and DefaultUserComparator/ReverseCmp
still match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f04ff92-6f89-4daf-8ffb-6d960db2d733
📒 Files selected for processing (2)
src/compaction/leveled/mod.rssrc/key_range.rs
f82e1d0 to
eefaca8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/key_range.rs:467
- New test function names (e.g.,
empty_input,single_range,disjoint_ranges_stay_separate) don't match the repository's test naming conventionfn <what>_<condition>_<expected>(). Renaming these tests to follow that pattern will make failures andcargo test <name>searches more consistent across the codebase.
let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
let bounds = (Unbounded, Included(b"key5" as &[u8]));
assert!(key_range.overlaps_with_bounds(&bounds));
}
#[test]
fn semi_open_5() {
let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
let bounds = (Unbounded, Included(b"key6" as &[u8]));
assert!(key_range.overlaps_with_bounds(&bounds));
}
#[test]
fn semi_open_6() {
let key_range = KeyRange(UserKey::from("key1"), UserKey::from("key5"));
let bounds = (Included(b"key0" as &[u8]), Unbounded);
assert!(key_range.overlaps_with_bounds(&bounds));
}
#[test]
fn semi_open_7() {
let key_range = KeyRange(UserKey::from("key5"), UserKey::from("key8"));
let bounds = (Unbounded, Excluded(b"key6" as &[u8]));
assert!(key_range.overlaps_with_bounds(&bounds));
}
}
mod overlaps_with_bounds_cmp {
use super::*;
use crate::comparator::UserComparator;
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/key_range.rs`:
- Around line 205-225: The function merge_sorted_cmp currently assumes its input
iterator is sorted and will silently produce incorrect merges if it's not;
change its visibility from pub to pub(crate) and add a debug_assert that
validates the monotonic order of incoming ranges by checking that each r.min()
is >= previous last.max() according to the provided comparator (use
cmp.compare(last.max(), r.min()) == Ordering::Less or equal as appropriate)
before merging inside merge_sorted_cmp so unsorted callers fail fast in tests;
keep the comparator-based comparisons (cmp.compare) in the assertion to match
the function's ordering semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1e975353-9024-4f65-99df-c1932663e64f
📒 Files selected for processing (2)
src/compaction/leveled/mod.rssrc/key_range.rs
3ff9199 to
2c595a0
Compare
2c595a0 to
9f4f0cf
Compare
- Add KeyRange::merge_sorted_cmp() to coalesce sorted ranges into disjoint intervals using a custom comparator - Replace per-table L2 overlap queries in multi-level compaction with merged-interval queries, reducing redundant binary searches when L0 tables overlap - Add unit tests for merge_sorted_cmp (empty, single, disjoint, overlapping, adjacent, contained, mixed, reverse comparator) Closes #122
…ug_assert for sorted input
9f4f0cf to
03c3070
Compare
## 🤖 New release * `coordinode-lsm-tree`: 4.0.0 -> 4.1.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [4.1.0](v4.0.0...v4.1.0) - 2026-03-24 ### Added - *(fs)* io_uring Fs implementation for high-throughput I/O ([#106](#106)) - *(compression)* zstd dictionary compression support ([#131](#131)) ### Documentation - add benchmark dashboard link and update badges ([#151](#151)) - add v4.0.0 fork epoch changelog (all changes since upstream v3.1.1) ### Fixed - *(version)* fsync version file before rewriting CURRENT pointer ([#152](#152)) - thread UserComparator through ingestion guards and range overlap ([#139](#139)) ### Performance - *(bench)* add multi-threaded support to all db_bench workloads ([#155](#155)) - *(merge)* replace IntervalHeap with sorted-vec heap + replace_min/replace_max ([#148](#148)) - *(compaction)* merge input ranges before L2 overlap query ([#146](#146)) ### Refactored - *(version)* comparator API cleanup — TransformContext + rename Run::push() ([#153](#153)) - add #[non_exhaustive] to CompressionType enum </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>
Summary
KeyRange::merge_sorted_cmp()to coalesce sorted key ranges into disjoint intervals using a custom comparatorTechnical Details
Previously, multi-level compaction queried L2 once per input table — O(L2_runs × input_tables × log L2_run_size). With overlapping L0 tables, many queries hit the same L2 regions redundantly.
Now, input key ranges from L0+L1 are sorted and merged into disjoint intervals first, then L2 is queried with the (typically much smaller) set of merged intervals.
Test Plan
merge_sorted_cmp(empty, single, disjoint, overlapping, adjacent, contained, mixed, reverse comparator)Closes #122