fix: thread UserComparator through Run, KeyRange, and Version methods#117
Conversation
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThreads a UserComparator through key-range, run, reader, iteration, and compaction codepaths; adds comparator-aware APIs (_cmp) across KeyRange, Run, RunReader, Level/Version, and compaction Strategy::choose; refactors leveled picker to consider all runs in a level; adds regression tests for a ReverseComparator. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as "Strategy::choose"
participant Version as "Version / Level / Run"
participant RunReader as "RunReader::new_cmp"
participant KeyRange as "KeyRange::*_cmp"
Client->>Version: Level::aggregate_key_range_cmp(cmp)
Client->>Version: run.range_overlap_indexes_cmp(bounds, cmp)
Client->>RunReader: RunReader::new_cmp(run, range, cmp)
RunReader->>Version: run.range_overlap_indexes_cmp(range, cmp)
RunReader->>KeyRange: key_range.contains_range_cmp(..., cmp)
KeyRange-->>RunReader: containment result
RunReader-->>Client: candidate readers / tables
Client-->>Client: build table_ids, apply hidden-table checks, return Choice
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
54f896c to
66cfb53
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/compaction/drop_range.rs (1)
82-86:⚠️ Potential issue | 🟠 MajorUse comparator-aware containment check to match overlap logic.
Line 82 correctly uses
range_overlap_indexes_cmp_bounds(&self.bounds, cmp), but line 86 filters withself.bounds.contains(x.key_range()), which relies on lexicographic byte comparison only. After getting tables that overlap in comparator order, filtering them by lexicographic containment breaks correctness for custom comparators likeReverseComparator.The codebase has established the pattern (see
src/version/run.rs:233): afterrange_overlap_indexes_cmp_bounds(), follow withcontains_range_cmp()on theKeyRange. Add acontains_cmpmethod toOwnedBoundsthat accepts the comparator:pub fn contains_cmp(&self, range: &KeyRange, cmp: &dyn crate::comparator::UserComparator) -> bool { // Implement using cmp.compare() similar to KeyRange::contains_range_cmp }Then update line 86 to:
.filter(|x| self.bounds.contains_cmp(x.key_range(), cmp))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compaction/drop_range.rs` around lines 82 - 86, The filter uses lexicographic contains but the overlap was computed with a comparator; add a comparator-aware containment to OwnedBounds and use it: implement OwnedBounds::contains_cmp(&self, range: &KeyRange, cmp: &dyn crate::comparator::UserComparator) (mirroring KeyRange::contains_range_cmp and using cmp.compare()), then replace the current .filter(|x| self.bounds.contains(x.key_range())) with .filter(|x| self.bounds.contains_cmp(x.key_range(), cmp)) so containment uses the same comparator as range_overlap_indexes_cmp_bounds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/blob_tree/ingest.rs`:
- Around line 254-259: The BlobIngestion write guards currently check ordering
with native byte Ord (e.g., `key > *prev`) which breaks custom comparators;
update all ordering checks inside BlobIngestion (the guards at the spots
referenced around Line 76, 116, 137) to use the index comparator instead (access
`index.config.comparator.as_ref()` or accept the comparator where BlobIngestion
is constructed) and replace the `>`/`<`/`==` checks with comparator-based
comparisons (e.g., call the comparator's compare/compare_bytes method and test
for Ordering::Greater/Equal/etc.) so validation uses the configured comparator
semantics.
In `@src/tree/ingest.rs`:
- Around line 325-330: The guarded write-path comparisons currently use plain
lexicographic operators like `key > *prev`, which is incorrect for custom
comparators; update each guard (the checks at the sites referenced near the
`with_new_l0_run` usage where `prev` and `key` are compared) to use the
configured comparator from `self.tree.config.comparator` instead of `>`: call
the comparator (via its `as_ref()` or its compare method) to compare `prev` and
`key` and interpret the returned Ordering to enforce monotonicity consistent
with the comparator (replace `key > *prev` semantics with a comparator-based
ordering test). Ensure this change is applied at all four guard sites so
ingestion ordering matches `with_new_l0_run`’s comparator.
---
Outside diff comments:
In `@src/compaction/drop_range.rs`:
- Around line 82-86: The filter uses lexicographic contains but the overlap was
computed with a comparator; add a comparator-aware containment to OwnedBounds
and use it: implement OwnedBounds::contains_cmp(&self, range: &KeyRange, cmp:
&dyn crate::comparator::UserComparator) (mirroring KeyRange::contains_range_cmp
and using cmp.compare()), then replace the current .filter(|x|
self.bounds.contains(x.key_range())) with .filter(|x|
self.bounds.contains_cmp(x.key_range(), cmp)) so containment uses the same
comparator as range_overlap_indexes_cmp_bounds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 64855ab2-089b-4a47-9a5c-85ac53af537b
📒 Files selected for processing (14)
src/blob_tree/ingest.rssrc/compaction/drop_range.rssrc/compaction/flavour.rssrc/compaction/leveled/mod.rssrc/compaction/worker.rssrc/key_range.rssrc/range.rssrc/run_reader.rssrc/tree/ingest.rssrc/tree/mod.rssrc/version/mod.rssrc/version/optimize.rssrc/version/run.rstests/custom_comparator.rs
There was a problem hiding this comment.
Pull request overview
This PR fixes incorrect table ordering and range selection when a non-lexicographic UserComparator is configured by introducing comparator-aware variants across Run/KeyRange/Level/RunReader and threading the comparator through Version mutation and compaction/range-scan call sites.
Changes:
- Add
_cmpvariants for sorting/searching/overlap checks that useUserComparatorinstead of bytewise lexicographic order. - Thread
SharedComparatorthroughoptimize_runs,Version::with_*methods, compaction strategies, and range scan construction (RunReader::new_cmp). - Add regression tests using
ReverseComparatorto cover compaction, leveled compaction, merges, and tombstones.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/custom_comparator.rs | Adds regression tests ensuring iteration/compaction behavior respects custom comparator ordering. |
| src/version/run.rs | Introduces comparator-aware run insertion and overlap/containment index selection helpers. |
| src/version/optimize.rs | Makes run optimization comparator-aware and updates unit tests accordingly. |
| src/version/mod.rs | Threads comparator into Version mutations and adds Level::aggregate_key_range_cmp. |
| src/tree/mod.rs | Passes configured comparator into Version updates during runtime operations. |
| src/tree/ingest.rs | Passes comparator into ingestion path when creating new L0 runs. |
| src/run_reader.rs | Adds RunReader::new_cmp to cull tables for range scans using comparator-aware overlap logic. |
| src/range.rs | Updates range iteration to use comparator-aware RunReader construction. |
| src/key_range.rs | Adds comparator-aware key-range operations (contains/overlaps/aggregate). |
| src/compaction/worker.rs | Threads comparator into with_moved and with_dropped version updates. |
| src/compaction/leveled/mod.rs | Uses comparator-aware range aggregation and overlap detection in leveled strategy decisions. |
| src/compaction/flavour.rs | Threads comparator into version-building during compaction application. |
| src/compaction/drop_range.rs | Uses comparator-aware overlap selection for dropping ranges. |
| src/blob_tree/ingest.rs | Passes comparator into blob ingestion path when creating new L0 runs. |
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
a11eb34 to
efd0bc2
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
efd0bc2 to
47aac04
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 9dfceab | Previous: 3b54ecb | Ratio |
|---|---|---|---|
fillrandom |
820414.8419444144 ops/sec |
1205121.0074657367 ops/sec |
1.47 |
readrandom |
466762.9383897613 ops/sec |
614429.2119898967 ops/sec |
1.32 |
seekrandom |
332287.8196148191 ops/sec |
402284.5910887458 ops/sec |
1.21 |
overwrite |
913550.8727567153 ops/sec |
1146265.159478525 ops/sec |
1.25 |
readwhilewriting |
388242.9573256522 ops/sec |
525608.465712753 ops/sec |
1.35 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
47aac04 to
2c86f4d
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
2c86f4d to
ce54e3e
Compare
|
@coderabbitai re-review |
pick_minimal_compaction operates on first_run() only. If a level has two runs (transient state from multi-level compaction #108), the second run would be missed. Return DoNothing and let the next compaction pass heal the multi-run state first. Replaces the relaxed debug_assert (run_count <= 2) with a runtime guard that avoids the problematic code path entirely.
Runtime DoNothing guard can stall compaction if multi-run state persists. Revert to relaxed debug_assert (run_count <= 2) — pick_minimal_compaction with first_run() is suboptimal but not incorrect for transient multi-run levels, and compaction still makes forward progress.
Tests exercise RunReader::new so the lint doesn't fire in test builds. Unconditional #[expect(dead_code)] would trigger unused-expect warning with deny(unused).
- Use crate::comparator::UserComparator in doc link to avoid broken intra-doc reference - Change reason to "crate-internal API" since run_reader is a private module
Deduplicate identical trim_slice inner functions in get_contained and get_contained_cmp into a single module-level helper.
…ect message - key_range contains_key doc: list contains_range_cmp as existing - leveled first_run expect: "at least one run" not "exactly one"
…run levels debug_assert with a hard run_count limit can panic in debug builds for valid transient states. Replace with log::debug since multi-run L1+ is a performance concern, not a correctness issue.
The manifest round-trips table order within each run, so recovered runs are already in comparator-sorted order. No re-sort needed.
RunReader comparator plumbing is done (new_cmp), but range bounds interpretation for reverse comparator remains unresolved (#116). Update ignore annotations to reflect the actual blocker.
Also update ignore annotations on range scan tests to reflect the actual blocker (#116 range bounds interpretation).
Accept &Level instead of &Run<Table> so the picker scans ALL runs in both levels. Trivial move checks overlap across all next-level runs; merge pull-in collects contained tables from all curr-level runs. Eliminates missed tables in transient multi-run levels from multi-level compaction (#108). Closes #122
Production SSTs store (comparator_min, comparator_max). With reverse comparator "z" < "p", so key range is (z,p) not (p,z).
take_while after flat_map kills the entire iterator when one run's window exceeds the size cap. Move inside flat_map so each run's windows are capped independently.
Exercises RunReader::new_cmp path in create_range — needs multiple SSTs in a single L1 run, which only happens after leveled compaction. Covers both full and bounded range scans.
1f8fe4e to
9dfceab
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
tests/custom_comparator.rs:1
flush_active_memtableis used elsewhere in this file with a monotonically increasing seqno that is >= the writes being flushed (e.g.,flush_active_memtable(2)after seqnos 0/1). Here it is repeatedly called with0while inserts use seqnos10..=90, which risks producing incorrect SST metadata or MVCC visibility assumptions and can make the test flaky/non-representative. Fix: pass a monotonically increasing flush seqno (e.g.,key + 1, or a local counter) that is >= the max seqno written to the memtable being flushed.
use lsm_tree::{AbstractTree, Config, Guard as _, SharedComparator, UserComparator};
Takes first/last from already-sorted slice — no key comparison, works correctly for any comparator.
## Summary - Add `KeyRange::merge_sorted_cmp()` to coalesce sorted key ranges into disjoint intervals using a custom comparator - Replace per-table L2 overlap queries in multi-level compaction with merged-interval queries, reducing redundant binary searches when L0 tables overlap - Parts 1 and 3 of #122 were already completed in #117; this PR implements Part 2 (merge input ranges optimization) ## Technical Details Previously, multi-level compaction queried L2 once per input table — O(L2_runs × input_tables × log L2_run_size). With overlapping L0 tables, many queries hit the same L2 regions redundantly. Now, input key ranges from L0+L1 are sorted and merged into disjoint intervals first, then L2 is queried with the (typically much smaller) set of merged intervals. ## Test Plan - 8 unit tests for `merge_sorted_cmp` (empty, single, disjoint, overlapping, adjacent, contained, mixed, reverse comparator) - All 21 existing leveled compaction tests pass (including multi-level data integrity tests) - Full suite: 490 lib + 33 doc tests pass, zero clippy warnings Closes #122
Summary
Extends comparator-aware coverage (#98 core fix landed in #100) to remaining code paths, plus fixes #122.
choose()— all overlap detection, key range aggregation, trivial move decisions now use comparatorpick_minimal_compactionmulti-run aware (fix: multi-level compaction — relax disjoint assert + merge input ranges optimization #122) — accepts&Levelinstead of&Run, scans all runs for overlap/containment. Eliminates missed tables in transient multi-run levels from multi-level compaction (feat(compaction): compute L2 overlaps per-range in multi-level path #108)RunReader::new_cmp— comparator-aware table selection for range scans (create_range+create_range_point)OwnedBounds::contains— comparator-aware containment fordrop_rangestrategyget_contained_cmp— comparator-aware table containment in runsLevel::aggregate_key_range_cmp+KeyRange::aggregate_cmp+KeyRange::contains_range_cmp— cross-run aggregation with comparatorWhat #100 covered vs what this PR adds
Run::push_cmp,get_overlapping_cmp,range_overlap_indexes_cmpoptimize_runs+Version::with_*comparator threadingchoose()comparator threadingpick_minimal_compactionmulti-run aware (#122)RunReader::new_cmpfor range scansOwnedBounds::containswith comparatorget_contained_cmp,contains_range_cmp,aggregate_cmpLevel::aggregate_key_range_cmpRunReader::newpublic API preservationtrim_slicededuplicationTest Plan
ReverseComparator(compaction, leveled, merge operator, tombstone)get_contained_cmpwith reverse comparatorcargo check+cargo clippy --libcleanCloses #122
Related