forked from fjall-rs/lsm-tree
-
Notifications
You must be signed in to change notification settings - Fork 1
feat: custom key comparison / comparator #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
3121610
feat: custom key comparison / comparator
polaz bbaa9d0
perf(comparator): zero-alloc prefixed slice compare, static default
polaz 97b230d
fix(comparator): use comparator for RT sort, suppression, equality doc
polaz 01cb591
style: fix unused import warnings in merge and table/util
polaz 0e53007
style: fix CI clippy lints (redundant pub(crate), doc backticks, stal…
polaz 3f13b57
fix(comparator): doc example bytewise-equality invariant, unsafe reason
polaz 83d9494
style: backtick RocksDB in doc comment
polaz 7a44516
fix(comparator): restore key-range early reject in RT suppression, SA…
polaz 45177ea
fix(memtable): account for SharedComparator in approximate_size
polaz dd10000
docs(merge): explain Arc-per-HeapItem overhead is negligible
polaz e7919ce
refactor(config): make comparator field pub(crate), document run lookup
polaz bdf6dfa
style: backtick key_range in doc comment
polaz 7efb69d
docs(memtable): note interval tree lexicographic limitation for custo…
polaz ccf07aa
docs(index_block): note test coverage location for iter seek behavior
polaz 47f7879
fix(comparator): RT decode validation, SAFETY docs, slow-path test
polaz ce06c00
docs(memtable): clarify Memtable::new is pub for host crate (fjall), …
polaz 167485d
docs(tests): clarify RT decode test uses default comparator intention…
polaz 6a474f1
test(comparator): add bounded range scan tests for reverse and u64 co…
polaz 9045f27
perf(comparator): stack buffer for custom comparator prefix comparison
polaz 7dfe07c
fix(comparator): remove lexicographic debug_assert in RangeTombstone:…
polaz 4f3a441
style: fix clippy items_after_statements and indexing_slicing in stac…
polaz 0efc392
docs(run): strengthen precondition doc for get_for_key_cmp ordering i…
polaz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| // Copyright (c) 2024-present, fjall-rs | ||
| // This source code is licensed under both the Apache 2.0 and MIT License | ||
| // (found in the LICENSE-* files in the repository) | ||
|
|
||
| use std::sync::Arc; | ||
|
|
||
| /// Trait for custom user key comparison. | ||
| /// | ||
| /// Comparators must be safe across unwind boundaries since they are stored | ||
| /// in tree structures that may be referenced inside `catch_unwind` blocks. | ||
| /// | ||
| /// Implementations must define a **strict total order** suitable for use in | ||
| /// sorted data structures (memtable skip list, SST block index, merge heap). | ||
| /// Specifically: | ||
| /// | ||
| /// - **Totality**: for all `a`, `b`, exactly one of `Less`, `Equal`, `Greater` holds | ||
| /// - **Transitivity**: `a < b` and `b < c` implies `a < c` | ||
| /// - **Antisymmetry**: `compare(a, b) == Less` iff `compare(b, a) == Greater` | ||
| /// - **Reflexivity**: `compare(a, a) == Equal` | ||
| /// | ||
| /// - **Bytewise equality**: `compare(a, b) == Equal` **must** imply `a == b` | ||
| /// byte-for-byte. Bloom filters and hash indexes operate on raw bytes; | ||
| /// if two byte-different keys compare as equal, hash-based lookups will | ||
| /// produce false negatives. | ||
| /// | ||
| /// Violating these invariants corrupts the sort order and produces incorrect | ||
| /// query results. | ||
| /// | ||
| /// # Important | ||
| /// | ||
| /// Once a tree is created with a comparator, it must always be opened with the | ||
| /// same comparator. Using a different comparator on an existing tree will produce | ||
| /// incorrect results. | ||
| /// | ||
| /// # Examples | ||
| /// | ||
| /// ``` | ||
| /// use lsm_tree::UserComparator; | ||
| /// use std::cmp::Ordering; | ||
| /// | ||
| /// /// Comparator that orders u64 keys stored as big-endian bytes. | ||
| /// struct U64Comparator; | ||
| /// | ||
| /// impl UserComparator for U64Comparator { | ||
| /// fn compare(&self, a: &[u8], b: &[u8]) -> Ordering { | ||
| /// if a.len() == 8 && b.len() == 8 { | ||
| /// // Length checked, conversion cannot fail. | ||
| /// let a_u64 = u64::from_be_bytes(a.try_into().unwrap()); | ||
| /// let b_u64 = u64::from_be_bytes(b.try_into().unwrap()); | ||
| /// a_u64.cmp(&b_u64) | ||
| /// } else { | ||
| /// // Non-8-byte keys: fall back to lexicographic ordering | ||
| /// // to preserve the bytewise-equality invariant. | ||
| /// a.cmp(b) | ||
| /// } | ||
| /// } | ||
| /// } | ||
| /// ``` | ||
| pub trait UserComparator: Send + Sync + std::panic::RefUnwindSafe + 'static { | ||
| /// Compares two user keys, returning their ordering. | ||
| fn compare(&self, a: &[u8], b: &[u8]) -> std::cmp::Ordering; | ||
|
|
||
| /// Returns `true` if this comparator is lexicographic byte ordering. | ||
| /// | ||
| /// When `true`, internal optimizations can avoid allocations in | ||
| /// prefix-compressed block comparisons. Override only if your | ||
| /// comparator is truly equivalent to `a.cmp(b)` on raw bytes. | ||
| fn is_lexicographic(&self) -> bool { | ||
| false | ||
| } | ||
| } | ||
|
|
||
| /// Default comparator using lexicographic byte ordering. | ||
| /// | ||
| /// This is the comparator used when no custom comparator is configured, | ||
| /// preserving backward compatibility with existing trees. | ||
| #[derive(Clone, Debug)] | ||
| pub struct DefaultUserComparator; | ||
|
|
||
| impl UserComparator for DefaultUserComparator { | ||
| #[inline] | ||
| fn compare(&self, a: &[u8], b: &[u8]) -> std::cmp::Ordering { | ||
| a.cmp(b) | ||
| } | ||
|
|
||
| #[inline] | ||
| fn is_lexicographic(&self) -> bool { | ||
| true | ||
| } | ||
| } | ||
|
|
||
| /// Shared reference to a [`UserComparator`]. | ||
| pub type SharedComparator = Arc<dyn UserComparator>; | ||
|
|
||
| /// Returns the default comparator (lexicographic byte ordering). | ||
| /// | ||
| /// Uses a shared static instance to avoid repeated allocations. | ||
| #[must_use] | ||
| pub fn default_comparator() -> SharedComparator { | ||
| // LazyLock creates the Arc once; subsequent calls just clone the Arc (ref-count bump). | ||
| static DEFAULT: std::sync::LazyLock<SharedComparator> = | ||
| std::sync::LazyLock::new(|| Arc::new(DefaultUserComparator)); | ||
| DEFAULT.clone() | ||
| } | ||
|
polaz marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.