feat: add UserComparator::name() for stable identity persistence#101
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughThis PR implements comparator identity persistence and validation for tree recovery. It adds a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Config
participant TreeInner
participant VersionPersist
participant Manifest
participant TreeRecover
rect rgba(100, 150, 200, 0.5)
Note over Client,Manifest: Tree Creation Flow
Client->>Config: new with custom comparator
Config->>TreeInner: create_new
TreeInner->>VersionPersist: persist_version with comparator.name()
VersionPersist->>Manifest: encode comparator_name field
Manifest->>Manifest: write to manifest metadata
end
rect rgba(200, 100, 100, 0.5)
Note over Client,TreeRecover: Tree Reopen Flow
Client->>TreeRecover: open tree with comparator
TreeRecover->>Manifest: decode_from (read comparator_name)
Manifest-->>TreeRecover: return stored comparator_name
TreeRecover->>TreeRecover: compare supplied.name() vs stored
alt Names Match
TreeRecover-->>Client: success, tree recovered
else Names Mismatch
TreeRecover-->>Client: Error::ComparatorMismatch
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds a stable, persisted identity for key comparators so trees can refuse to open with an incompatible UserComparator, preventing silent corruption when comparator configuration changes across open/close cycles.
Changes:
- Extend
UserComparatorwithname() -> &'static strand implement it for the default comparator and test comparators. - Persist
comparator_nameinto the version/manifest SFA archive duringpersist_version, and plumb it throughSuperVersionsso subsequent version writes keep the same identity. - On
Tree::recover, decode the stored comparator name and returnError::ComparatorMismatchif it differs from the supplied comparator’s name; add integration tests covering reopen success/failure cases.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/custom_comparator.rs | Adds comparator name() implementations and reopen/mismatch integration tests. |
| src/comparator.rs | Extends the public comparator trait with name() and documents persisted identity behavior. |
| src/config/mod.rs | Updates Config::comparator docs to reflect persisted name + mismatch failure. |
| src/error.rs | Introduces Error::ComparatorMismatch { stored, supplied }. |
| src/manifest.rs | Decodes optional comparator_name from manifest with a legacy fallback to "default". |
| src/tree/mod.rs | Enforces comparator-name match during recovery before accessing any data. |
| src/tree/inner.rs | Persists comparator name when creating the initial version file. |
| src/version/mod.rs | Writes comparator_name into the version/manifest encoding. |
| src/version/persist.rs | Threads comparator_name through persist_version into Version::encode_into. |
| src/version/super_version.rs | Stores comparator name for subsequent version upgrades and passes it to persistence. |
| src/table/util.rs | Updates internal test comparator implementation to satisfy the new trait method. |
296df9d to
cf3b2f0
Compare
2d1bed3 to
ae88449
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/tree/mod.rs (1)
1340-1346:⚠️ Potential issue | 🟠 MajorValidate the comparator before calling
recover_levels().
recover_levels()already usesconfig.comparatorduring table recovery and orphan cleanup, so a reopen that is going to fail at Lines 1370-1381 still does comparator-sensitive work and can delete files first. Move the manifest decode/name check ahead of this call so rejected opens are side-effect free.Also applies to: 1370-1381
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tree/mod.rs` around lines 1340 - 1346, Move the comparator/manifest validation earlier so no side-effecting work runs before an open is rejected: before calling Self::recover_levels(...) validate the manifest decode and that config.comparator matches the manifest/name check (the same checks currently around the reopen logic at the block handling Lines 1370-1381), and only call recover_levels (which uses config.comparator) after those validations succeed; update the code paths around recover_levels and the reopen logic (referencing recover_levels, config.comparator, and the manifest decode/name check) so rejected opens return early without invoking recover_levels or performing orphan/table cleanup.
🧹 Nitpick comments (1)
src/manifest.rs (1)
169-210: Good test coverage for comparator name handling.The tests verify the three critical scenarios:
- Missing section defaults to
"default"- Present section round-trips correctly
- Oversized section is rejected
Consider adding a test for invalid UTF-8 bytes in the
comparator_namesection to verify theError::Utf8path.🧪 Optional: Add invalid UTF-8 test
#[test] fn manifest_rejects_invalid_utf8_comparator_name() -> crate::Result<()> { let dir = tempfile::tempdir()?; let path = dir.path().join("manifest"); // Write manifest with invalid UTF-8 bytes directly let file = std::fs::File::create(&path)?; let mut writer = sfa::Writer::from_writer(std::io::BufWriter::new(file)); writer.start("format_version")?; writer.write_u8(FormatVersion::V4.into())?; writer.start("tree_type")?; writer.write_u8(TreeType::Standard.into())?; writer.start("level_count")?; writer.write_u8(7)?; writer.start("filter_hash_type")?; writer.write_u8(u8::from(ChecksumType::Xxh3))?; writer.start("comparator_name")?; // Invalid UTF-8 sequence writer.write_all(&[0xFF, 0xFE])?; writer.finish().map_err(|e| match e { sfa::Error::Io(e) => crate::Error::from(e), _ => unreachable!(), })?; let reader = sfa::Reader::new(&path)?; let result = Manifest::decode_from(&path, &reader); assert!( matches!(result, Err(crate::Error::Utf8(_))), "expected Utf8 error" ); Ok(()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/manifest.rs` around lines 169 - 210, Add a test that writes a manifest with an invalid UTF-8 byte sequence in the "comparator_name" section using sfa::Writer (write the required sections: format_version, tree_type, level_count, filter_hash_type, then start "comparator_name" and write invalid bytes like 0xFF, 0xFE), open it with sfa::Reader and call Manifest::decode_from(&path, &reader), and assert the result matches Err(crate::Error::Utf8(_)); reference Manifest::decode_from, sfa::Writer, and the "comparator_name" section to locate where to create the invalid-UTF8 test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/version/persist.rs`:
- Around line 13-21: The code currently panics with assert! in persist_version
when comparator_name.len() > MAX_COMPARATOR_NAME_LEN; replace the assert with a
runtime validation that returns an Err via the existing crate::Result return
type. In persist_version, check if comparator_name.len() >
MAX_COMPARATOR_NAME_LEN and return an appropriate crate::Result::Err (construct
an error using your crate's error type or helper) containing a clear message
that the comparator name is too long and include comparator_name and
MAX_COMPARATOR_NAME_LEN for context, instead of panicking.
---
Outside diff comments:
In `@src/tree/mod.rs`:
- Around line 1340-1346: Move the comparator/manifest validation earlier so no
side-effecting work runs before an open is rejected: before calling
Self::recover_levels(...) validate the manifest decode and that
config.comparator matches the manifest/name check (the same checks currently
around the reopen logic at the block handling Lines 1370-1381), and only call
recover_levels (which uses config.comparator) after those validations succeed;
update the code paths around recover_levels and the reopen logic (referencing
recover_levels, config.comparator, and the manifest decode/name check) so
rejected opens return early without invoking recover_levels or performing
orphan/table cleanup.
---
Nitpick comments:
In `@src/manifest.rs`:
- Around line 169-210: Add a test that writes a manifest with an invalid UTF-8
byte sequence in the "comparator_name" section using sfa::Writer (write the
required sections: format_version, tree_type, level_count, filter_hash_type,
then start "comparator_name" and write invalid bytes like 0xFF, 0xFE), open it
with sfa::Reader and call Manifest::decode_from(&path, &reader), and assert the
result matches Err(crate::Error::Utf8(_)); reference Manifest::decode_from,
sfa::Writer, and the "comparator_name" section to locate where to create the
invalid-UTF8 test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 94127766-c8a8-43d8-8f42-84e476a094de
📒 Files selected for processing (11)
src/comparator.rssrc/config/mod.rssrc/error.rssrc/manifest.rssrc/table/util.rssrc/tree/inner.rssrc/tree/mod.rssrc/version/mod.rssrc/version/persist.rssrc/version/super_version.rstests/custom_comparator.rs
91a9dae to
73de2ae
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 0555155 | Previous: 2c7d5dd | Ratio |
|---|---|---|---|
fillrandom |
865687.9591046238 ops/sec |
1012070.7707286807 ops/sec |
1.17 |
mergerandom |
600997.8818580902 ops/sec |
693381.2502304625 ops/sec |
1.15 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
0555155 to
b927cd9
Compare
Cover the three paths in Manifest::decode_from: - missing section defaults to "default" - present section round-trips correctly - oversized section (>256 bytes) is rejected
- Assert comparator_name <= 256 bytes in persist_version (symmetric with the read-side bound in Manifest::decode_from) - Replace unwrap() with ? in manifest unit tests (clippy lint) - Clarify backward-compat comment: custom comparators cannot exist in pre-change manifests because UserComparator was introduced in the same release cycle
Cover the name() and is_lexicographic() methods directly to satisfy codecov patch coverage threshold.
Use `crate::UserComparator::name` in doc comments within config and error modules where UserComparator is not directly in scope.
UserComparator::name() returns &'static str — an oversized name is a programmer error, not a runtime condition. Assert is appropriate.
Move comparator name and format version checks ahead of recover_levels() so a rejected open is side-effect free. recover_levels loads tables and cleans up orphans — running it before validation wastes work and can modify disk state for an open that will be rejected anyway.
Cover the Error::Utf8 path when comparator_name section contains non-UTF-8 bytes.
The test-only ReverseComparator's name() method was never called, leaving it uncovered in the patch diff.
- Extract shared MAX_COMPARATOR_NAME_BYTES constant to comparator module, replacing duplicated literals in persist.rs and manifest.rs - Downgrade comparator mismatch log from error to warn — this is a user-correctable misconfiguration, not an internal error
- MAX_COMPARATOR_NAME_BYTES: pub(crate) → pub (clippy::redundant_pub_crate) - Pre-recovery validation scope: remove tree_type from list (checked post-recovery)
Replace assert! with an Err return for comparator names exceeding MAX_COMPARATOR_NAME_BYTES, giving callers a recoverable error instead of a process crash.
Exercise the persist_version error path for comparator names exceeding MAX_COMPARATOR_NAME_BYTES (256 bytes).
9f4e2b2 to
27e5c0e
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Use ? directly — From<sfa::Error> for crate::Error is implemented, no need for manual map_err with unreachable branch.
Required after #101 added name() as a required trait method.
Required after #101 added name() as a required trait method.
Required after #101 added name() as a required trait method.
Summary
name() -> &'static strmethod toUserComparatortrait for stable comparator identityError::ComparatorMismatch"default"(matchingDefaultUserComparator)Technical Details
comparator_namesection in sfa archive duringpersist_versionSuperVersionsstorescomparator_name: Arc<str>so flush/compaction version upgrades include it without extra plumbingTree::recoverafter manifest decode, before any data accessComparator::Name()pattern (requested in feat: custom key comparison / comparator #67 review)Test Plan
ComparatorMismatchCloses #74
Summary by CodeRabbit
New Features
Bug Fixes
Tests