feat: add intra-L0 compaction for overlapping runs by polaz · Pull Request #260 · fjall-rs/lsm-tree

polaz · 2026-03-14T11:54:21Z

Summary

During write bursts, L0 accumulates multiple overlapping runs from memtable flushes, causing point-read latency spikes (each run must be checked)
Adds intra-L0 compaction: when L0 has run_count > 1 and table_count < l0_threshold, all L0 runs are merged into a single sorted run within L0 (dest_level: 0)
Falls through to existing L0→L1 compaction when table_count reaches threshold, preventing infinite loops
~26 LOC in strategy, ~44 LOC test

Test plan

New test leveled_intra_l0_compaction: flush 3 overlapping memtables, verify multiple L0 runs, compact, verify single L0 run, verify all data readable and L1 remains empty
cargo test --all-features — all existing tests pass (no regressions)

Closes #163

Summary by CodeRabbit

New Features
- Intra-L0 compaction to merge overlapping runs below configured threshold.
- Batch key lookup via multi_get API.
- Key prefix existence checking via contains_prefix API.
- Integrity verification for SST and blob files.
- Customizable sequence number generators.
Bug Fixes
- Safer LZ4 decompression with validation of uncompressed length.
- Sequence number overflow prevention.
Documentation
- Added code review guidelines and CI instructions.
- Updated README with maintained fork details.
Chores
- CI/CD automation: Dependabot, GitHub Actions workflows, release automation.
Tests
- Added comprehensive tests for new features and edge cases.

Adds a public verify module with verify_integrity() that scans all SST and blob files in a tree, computing XXHash-3 128-bit checksums and comparing them against the checksums stored in the version manifest. This enables detection of silent bit-rot, partial writes, and other on-disk corruption without reading individual blocks. The function returns an IntegrityReport with: - sst_files_checked / blob_files_checked counters - Per-file errors (SstFileCorrupted, BlobFileCorrupted, IoError) - is_ok() / files_checked() convenience methods Tests cover: - Clean tree verification (standard + blob tree) - SST corruption detection - Blob file corruption detection - Multiple SST files Closes fjall-rs#187

- Compute XXH3 via BufReader instead of loading entire files into memory (prevents OOM on large SST/blob files) - Store std::io::Error directly in IoError variant instead of String - Use BlobFileId type alias instead of raw u64 - Add #[non_exhaustive] to IntegrityError for forward compatibility - Fix rustdoc: remove incorrect # Errors section - Add test for missing file (IoError path)

BlobFileId is defined in the private vlog module and not re-exported, so using it in the public IntegrityError enum would leak a private type to external consumers.

Enables error chain integration (anyhow, thiserror, ? propagation). IoError variant returns the underlying std::io::Error via source().

- Switch stream_checksum to Xxh3Default matching ChecksummedWriter - Use path reference comparison instead of deref in test assertion - Remove redundant doc comment on enum field visibility

Add contains_prefix() to AbstractTree trait that checks if any key with the given prefix exists, stopping at the first match instead of materializing a full iterator. - Default implementation on AbstractTree uses prefix().next() - BlobTree overrides to delegate to index tree, avoiding value log reads - MVCC-correct: respects seqno visibility and tombstones Closes fjall-rs#138

coderabbitai · 2026-03-14T11:54:41Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds intra-L0 compaction to merge overlapping runs, refactors sequence number generation to a trait-based system with clamped MAX_SEQNO boundaries, implements seqno-aware data block seeking, introduces multi_get and contains_prefix query APIs, adds integrity verification for SST/blob files, and establishes CI/CD workflows and code review guidelines.

Changes

Cohort / File(s)	Summary
Intra-L0 Compaction `src/compaction/leveled/mod.rs`, `src/compaction/leveled/test.rs`, `src/version/mod.rs`	Adds intra-L0 compaction path merging overlapping L0 runs when count is below threshold. Modified Run insertion logic in with_merge to append new runs to L0 instead of prepending. Includes comprehensive test validating consolidation, table count, and L1 behavior.
Sequence Number Generation Refactor `src/seqno.rs`, `src/config/mod.rs`, `src/compaction/worker.rs`, `src/version/super_version.rs`	Introduces SequenceNumberGenerator trait with MAX_SEQNO constant (0x7FFF_FFFF_FFFF_FFFF), SharedSequenceNumberGenerator type alias, and refactors SequenceNumberCounter to implement trait. Updates Config and Options to use new generator types. Implements saturating_add clamping for visible_seqno advancement.
SeqNo-Aware Data Block Operations `src/table/data_block/iter.rs`, `src/table/data_block/iter_test.rs`, `src/table/data_block/mod.rs`, `src/table/iter.rs`	Updates data block iterator seek methods to accept seqno parameter for snapshot-aware positioning. Adds seek_to_key_seqno for seqno-aware binary search. Modifies point_read to use seqno-aware seeking. Adds extensive tests validating seqno behavior across restart intervals and mixed-key scenarios.
Query APIs: Multi-Get and Contains-Prefix `src/abstract_tree.rs`, `src/blob_tree/mod.rs`, `src/tree/mod.rs`, `tests/multi_get.rs`, `tests/tree_contains_prefix.rs`	Adds multi_get and contains_prefix trait methods to AbstractTree. Implements resolve_key helper and method overrides in BlobTree. Includes batch resolution with shared SuperVersion. Comprehensive test suites validating snapshot isolation, tombstones, blob indirection, and MVCC semantics.
Integrity Verification System `src/verify.rs`, `src/table/block/mod.rs`, `src/vlog/blob_file/reader.rs`, `src/vlog/mod.rs`, `tests/verify_integrity.rs`	New module with IntegrityError enum and IntegrityReport struct for validating SST and blob checksums. Replaces unsafe LZ4 buffer construction with safe Vec-based approach including decompression validation. Adds streaming checksum computation. Extensive tests covering clean trees, corruption detection, and error handling.
Block and Storage Improvements `src/table/block/mod.rs`, `src/vlog/blob_file/reader.rs`, `src/slice/slice_bytes/mod.rs`	Replaces unsafe buffer construction for LZ4 decompression with Vec-based safe approach and runtime validation of uncompressed length. Adds test for corrupted length detection. Updates clippy annotations for explicit expect usage. Changes to usize-based arithmetic for blob value reading.
Minor Configuration and Lint Updates `src/lib.rs`, `src/manifest.rs`, `src/table/filter/mod.rs`, `src/table/util.rs`, `src/table/mod.rs`, `src/version/run.rs`, `clippy.toml`	Exports new seqno types and verify module. Adds dead_code and expect attributes with justifications. Updates clippy lints from warn to expect with detailed reasons. Adds allowed-duplicate-crates config for hashbrown. Changes highest_seqno calculation to offset by global_seqno.
Test Suites `tests/custom_seqno_generator.rs`, `tests/ingestion_seqno.rs`	Introduces OffsetGenerator implementing SequenceNumberGenerator for testing custom generators. Validates seqno progression through Config builder and new_with_generators constructor. Tests ingestion persisted seqno tracking across operations.
CI/CD Infrastructure `.github/workflows/coordinode-ci.yml`, `.github/workflows/coordinode-release.yml`, `.github/workflows/upstream-monitor.yml`, `.github/dependabot.yml`, `.release-plz.toml`	Adds comprehensive GitHub Actions workflows for lint/test on push/PR with matrix testing (stable and 1.90.0), code coverage with nightly and llvm-cov, automated release PRs via release-plz, and upstream synchronization monitor. Configures Dependabot for cargo and GitHub Actions updates with conventional commits.
Documentation and Guidelines `README.md`, `.github/copilot-instructions.md`, `.github/instructions/code-review.instructions.md`	Updates CI badges and adds Maintained fork note with Structured World Foundation context. Adds comprehensive code review guidelines covering scope, Rust standards, testing practices, commit format, feature flags, and architecture overview.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

epic

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR includes significant out-of-scope changes: new API methods (multi_get, contains_prefix), sequence number generator refactoring, integrity verification, GitHub workflows, and documentation.	Separate the intra-L0 compaction feature (`#163`) from infrastructure changes (workflows, generators, multi_get API). Create distinct PRs for API additions and build tooling.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add intra-L0 compaction for overlapping runs' accurately and concisely summarizes the main change in the PR.
Linked Issues check	✅ Passed	The PR implements the core requirement from `#163`: adding intra-L0 compaction to merge overlapping L0 runs when below threshold, with proper test coverage.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compaction/leveled/test.rs`:
- Around line 76-81: Add an assertion that verifies the table count after
intra-L0 compaction to ensure runs were merged into the expected number of
tables; immediately after the existing assert_eq!(1, tree.l0_run_count()), add
an assertion like assert_eq!(1, tree.table_count()) (or the project's equivalent
method that returns total table count) so the test checks both l0_run_count()
and the final table count consistent with leveled_l0_reached_limit.
- Around line 83-88: The test currently verifies that data is readable after
intra-L0 compaction but doesn't assert that files remained in L0; update the
test (the block using tree.get and the loop over i in 0..3u8) to also assert
that no tables were promoted to L1 or higher by inspecting the tree's level
metadata (e.g., check that level 1 and above have zero tables/entries or that
tree.levels[1].is_empty() / equivalent API returns true). Use the tree's public
API or metadata accessor (the same test's tree variable) to assert levels > 0
are empty immediately after compaction to explicitly verify intra-L0 behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aef431da-cfb3-4ee9-978e-8b00204942b2

📥 Commits

Reviewing files that changed from the base of the PR and between e11c791 and 74b5521.

📒 Files selected for processing (2)

src/compaction/leveled/mod.rs
src/compaction/leveled/test.rs

Copilot

Pull request overview

Adds an intra-L0 compaction step to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run when L0 is below its compaction trigger threshold.

Changes:

Add intra-L0 “merge all L0 tables into one run” compaction choice when l0.run_count > 1 and l0.table_count < l0_threshold.
Add a unit test covering overlapping flushes and verifying L0 run consolidation and read availability.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/compaction/leveled/mod.rs`	Adds intra-L0 compaction selection logic before the existing scoring/L0→L1 path.
`src/compaction/leveled/test.rs`	Adds `leveled_intra_l0_compaction` test that flushes overlapping memtables and validates run consolidation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Doc describes convenience and error propagation, not false optimization claim - Test asserts "ab" matches "abc:*" keys at visible seqno - Add BlobTree test covering delegated index-only path

Delegate to index tree avoids BlobGuard construction overhead, not value-log reads (key() never resolves blob indirections).

Copilot

Pull request overview

Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification caused by multiple overlapping L0 runs during write bursts, while preserving the existing L0→L1 trigger behavior at the L0 table threshold.

Changes:

Introduce an intra-L0 merge choice when L0 has multiple runs but remains below the L0 table-count threshold.
Add a unit test that flushes overlapping memtables, triggers compaction, and verifies L0 run consolidation plus data correctness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/compaction/leveled/mod.rs	Adds intra-L0 compaction selection logic (merge L0 tables back into L0 when below threshold).
src/compaction/leveled/test.rs	Adds `leveled_intra_l0_compaction` test to validate run consolidation and read correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

This PR adds an “intra-L0” compaction path to reduce point-read latency during write bursts by merging multiple overlapping L0 runs into a single sorted run that stays in L0, while preserving the existing L0→L1 compaction trigger at the configured threshold.

Changes:

Extend the leveled compaction strategy to choose an intra-L0 merge when L0 has multiple runs and is below the L0 table threshold.
Add a unit test that flushes overlapping memtables, runs compaction, and asserts L0 run consolidation + read correctness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/compaction/leveled/mod.rs`	Adds a new compaction choice that merges all L0 tables into `dest_level: 0` when multiple L0 runs exist but L0 is still below threshold.
`src/compaction/leveled/test.rs`	Adds `leveled_intra_l0_compaction` to validate that overlapping L0 runs are consolidated within L0 and data remains readable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Add multi_get to AbstractTree trait with single-snapshot semantics - Implement for Tree and BlobTree with one lock acquisition for all keys - Add tests: all-existing, mixed, empty, snapshot isolation, tombstones, disk Closes fjall-rs#96

Exploit internal key ordering (user_key ASC, seqno DESC) to include seqno in the binary search predicate. This skips entire restart intervals containing only versions newer than the target snapshot, reducing O(versions) linear scan to O(restart_interval) for keys with many MVCC versions. Closes fjall-rs#237

Copilot

Pull request overview

Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run while staying in L0.

Changes:

Add an intra-L0 merge choice in the leveled compaction strategy when L0.run_count > 1 and L0.table_count < l0_threshold.
Adjust version merge behavior for dest_level == 0 to preserve L0 “newest-first” run ordering in the presence of concurrent flushes.
Add a unit test covering intra-L0 compaction behavior and validating read correctness and that data remains in L0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/compaction/leveled/mod.rs`	Introduces intra-L0 compaction selection logic in the leveled strategy.
`src/version/mod.rs`	Ensures merged output targeting L0 is appended to maintain correct run recency ordering.
`src/compaction/leveled/test.rs`	Adds a test verifying intra-L0 consolidation and read correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Store compressed bytes in `compressed_buf: Option<Vec<u8>>` instead of taking a reference to the return value of compress(). Feature-gate the variable behind `#[cfg(any(feature = "lz4", feature = "zstd"))]`.

- encode_into must remain infallible since encode_into_vec() uses expect() internally, assuming encoding cannot fail - level validation stays in CompressionType::zstd() constructor and Decode::decode_from for untrusted on-disk data - update test to validate through the constructor instead of encode_into

…ssed_buf

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/table/filter/mod.rs (1)

99-116: 🧹 Nitpick | 🔵 Trivial

Consider adding a test for the n == 0 edge case.

The new early return at lines 46-48 handles the zero-item case, but there's no test coverage for it. A simple test would document the expected behavior:

📝 Suggested test

+    #[test]
+    fn bloom_estimated_size_zero_items() {
+        let bpk_policy = BloomConstructionPolicy::BitsPerKey(10.0);
+        assert_eq!(bpk_policy.estimated_filter_size(0), 0);
+
+        let fpr_policy = BloomConstructionPolicy::FalsePositiveRate(0.01);
+        assert_eq!(fpr_policy.estimated_filter_size(0), 0);
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/table/filter/mod.rs` around lines 99 - 116, Add a unit test that covers
the n == 0 edge case by calling BloomConstructionPolicy::BitsPerKey(...) and
BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the
estimated_filter_size(0) method and assert the returned size equals 0 (this
verifies the early-return added around estimated_filter_size), placing the test
in the same mod/tests area alongside bloom_estimated_size_bpk and
bloom_estimated_size_fpr.

src/version/super_version.rs (1)

115-147: ⚠️ Potential issue | 🟠 Major

Validate custom-generator seqnos before persisting a new version.

Config can now accept arbitrary SequenceNumberGenerator implementations, but next_version.seqno = seqno is still written unchecked. Clamping visible_seqno only protects the watermark; a buggy generator can still persist a reserved-MSB seqno and leave snapshot selection inconsistent. Guard seqno <= MAX_SEQNO before persist_version.

🛡️ Proposed guard

     ) -> crate::Result<()> {
+        assert!(
+            seqno <= MAX_SEQNO,
+            "Sequence number must not use the reserved MSB"
+        );
         let mut next_version = f(&self.latest_version())?;
         next_version.seqno = seqno;
         log::trace!("Next version seqno={}", next_version.seqno);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/version/super_version.rs` around lines 115 - 147, The seqno provided to
upgrade_version_with_seqno is not validated before being persisted, so add a
guard in upgrade_version_with_seqno (before calling persist_version and
append_version) that checks the provided SeqNo (seqno) is <= MAX_SEQNO; if the
check fails return an appropriate crate::Result error (e.g., Err with a
descriptive error) instead of persisting, and only proceed to call
persist_version(tree_path, &next_version.version) and
self.append_version(next_version) after the validation; also keep the existing
visible_seqno.fetch_max logic but perform it only after the seqno passes
validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/copilot-instructions.md:
- Around line 35-40: Update the fenced code block that contains "<type>(scope):
<description>" to include a language identifier (e.g., use "text") so markdown
linting passes and formatting is consistent; locate the triple-backtick block in
.github/copilot-instructions.md and change the opening fence from ``` to ```text
(or remove language if you prefer plain fence) while keeping the inner lines "-
Detail 1" and "- Detail 2" unchanged.

In `@tests/custom_seqno_generator.rs`:
- Around line 61-104: The tests currently only call seqno.next() /
visible_seqno.fetch_max() directly so they don't verify the Config wiring;
modify each test (custom_generator_via_builder and
custom_generator_via_new_with_generators) to force an internal version-upgrade
path that consumes the injected generators (e.g., call
tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting)
and then assert the injected OffsetGenerator instances (the seqno.clone() and
visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or
Config::new_with_generators) have advanced as expected (or that tree operations
using visible_seqno reflect the new values), thereby proving the config
generators are actually used.

In `@tests/verify_integrity.rs`:
- Around line 53-62: The corruption injection assumes the SST file is at least
100 bytes; update the mutation in the test (the block that opens table.path
after calling tree.current_version() and version.iter_tables()) to first stat
the file and verify its size, and if the file is smaller than the hardcoded
offset write point, compute a safe relative offset (e.g., max(0,
file_size.saturating_sub(1).saturating_sub(corrupt_len)) or file_size / 2) and
use that offset instead; ensure you check and handle errors from metadata
retrieval and use that computed offset when seeking/writing so the corruption
never silently fails on small test files.

---

Outside diff comments:
In `@src/table/filter/mod.rs`:
- Around line 99-116: Add a unit test that covers the n == 0 edge case by
calling BloomConstructionPolicy::BitsPerKey(...) and
BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the
estimated_filter_size(0) method and assert the returned size equals 0 (this
verifies the early-return added around estimated_filter_size), placing the test
in the same mod/tests area alongside bloom_estimated_size_bpk and
bloom_estimated_size_fpr.

In `@src/version/super_version.rs`:
- Around line 115-147: The seqno provided to upgrade_version_with_seqno is not
validated before being persisted, so add a guard in upgrade_version_with_seqno
(before calling persist_version and append_version) that checks the provided
SeqNo (seqno) is <= MAX_SEQNO; if the check fails return an appropriate
crate::Result error (e.g., Err with a descriptive error) instead of persisting,
and only proceed to call persist_version(tree_path, &next_version.version) and
self.append_version(next_version) after the validation; also keep the existing
visible_seqno.fetch_max logic but perform it only after the seqno passes
validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c9622a6e-755b-4d40-903a-6f21909f6144

📥 Commits

Reviewing files that changed from the base of the PR and between 3de8408 and 507702e.

📒 Files selected for processing (34)

.github/copilot-instructions.md
.github/instructions/code-review.instructions.md
Cargo.toml
clippy.toml
src/abstract_tree.rs
src/blob_tree/mod.rs
src/compaction/leveled/mod.rs
src/compaction/leveled/test.rs
src/compaction/worker.rs
src/config/mod.rs
src/lib.rs
src/manifest.rs
src/seqno.rs
src/slice/slice_bytes/mod.rs
src/table/block/mod.rs
src/table/data_block/iter.rs
src/table/data_block/iter_test.rs
src/table/data_block/mod.rs
src/table/filter/mod.rs
src/table/iter.rs
src/table/mod.rs
src/table/util.rs
src/tree/mod.rs
src/verify.rs
src/version/mod.rs
src/version/run.rs
src/version/super_version.rs
src/vlog/blob_file/reader.rs
src/vlog/mod.rs
tests/custom_seqno_generator.rs
tests/ingestion_seqno.rs
tests/multi_get.rs
tests/tree_contains_prefix.rs
tests/verify_integrity.rs

coderabbitai · 2026-03-16T00:16:34Z

+```
+<type>(scope): <description>
+
+- Detail 1
+- Detail 2
+```


🧹 Nitpick | 🔵 Trivial

Add language identifier to fenced code block.

The commit message format code block should have a language identifier for consistency and to satisfy markdown linting. Consider using text or leaving it empty with triple backticks alone if no syntax highlighting is needed.

📝 Suggested fix

-``` +```text <type>(scope): <description> - Detail 1 - Detail 2

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md around lines 35 - 40, Update the fenced code block that contains "<type>(scope): <description>" to include a language identifier (e.g., use "text") so markdown linting passes and formatting is consistent; locate the triple-backtick block in .github/copilot-instructions.md and change the opening fence from ``` to ```text (or remove language if you prefer plain fence) while keeping the inner lines "- Detail 1" and "- Detail 2" unchanged.

coderabbitai · 2026-03-16T00:16:34Z

+#[test]
+fn custom_generator_via_builder() -> lsm_tree::Result<()> {
+    let path = lsm_tree::get_tmp_folder();
+
+    let seqno = OffsetGenerator::new(1000);
+    let visible_seqno = OffsetGenerator::new(1000);
+
+    let tree = Config::new(
+        &path,
+        SequenceNumberCounter::default(),
+        SequenceNumberCounter::default(),
+    )
+    .seqno_generator(seqno.clone())
+    .visible_seqno_generator(visible_seqno.clone())
+    .open()?;
+
+    let s = seqno.next();
+    assert_eq!(s, 1000);
+    tree.insert(b"key", b"value", s);
+    visible_seqno.fetch_max(s + 1);
+
+    assert_eq!(tree.len(visible_seqno.get(), None)?, 1);
+
+    Ok(())
+}
+
+#[test]
+fn custom_generator_via_new_with_generators() -> lsm_tree::Result<()> {
+    let path = lsm_tree::get_tmp_folder();
+
+    let seqno = OffsetGenerator::new(5000);
+    let visible_seqno = OffsetGenerator::new(5000);
+
+    let tree = Config::new_with_generators(&path, seqno.clone(), visible_seqno.clone()).open()?;
+
+    let s = seqno.next();
+    assert_eq!(s, 5000);
+    tree.insert(b"hello", b"world", s);
+    visible_seqno.fetch_max(s + 1);
+
+    let val = tree.get(b"hello", visible_seqno.get())?.unwrap();
+    assert_eq!(&*val, b"world");
+
+    Ok(())


🧹 Nitpick | 🔵 Trivial

These tests don't actually prove the config wiring works.

Both cases drive seqno and visible_seqno directly, and neither one triggers a tree path that consumes self.config.seqno / self.config.visible_seqno. If seqno_generator(...), visible_seqno_generator(...), or new_with_generators(...) were accidentally ignored, these assertions would still pass. Please force an internal version-upgrade path such as clear(), flush_active_memtable(), or compaction and assert that the injected generators advance.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/custom_seqno_generator.rs` around lines 61 - 104, The tests currently only call seqno.next() / visible_seqno.fetch_max() directly so they don't verify the Config wiring; modify each test (custom_generator_via_builder and custom_generator_via_new_with_generators) to force an internal version-upgrade path that consumes the injected generators (e.g., call tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting) and then assert the injected OffsetGenerator instances (the seqno.clone() and visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or Config::new_with_generators) have advanced as expected (or that tree operations using visible_seqno reflect the new values), thereby proving the config generators are actually used.

coderabbitai · 2026-03-16T00:16:34Z

+    // Corrupt a byte in the SST file
+    let version = tree.current_version();
+    let table = version.iter_tables().next().unwrap();
+    {
+        use std::io::{Seek, Write};
+        let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?;
+        f.seek(std::io::SeekFrom::Start(100))?;
+        f.write_all(b"CORRUPT")?;
+        f.sync_all()?;
+    }


🧹 Nitpick | 🔵 Trivial

Corruption injection approach is reasonable but assumes minimum file size.

The tests corrupt bytes at offsets 100 (lines 59, 132) and 50 (lines 278, 324). This works for the test data (26 keys with values), but could fail silently if future test modifications reduce file sizes below these offsets.

Consider adding a defensive check or using a relative offset:

💡 Suggested defensive approach

// Corrupt a byte in the SST file let version = tree.current_version(); let table = version.iter_tables().next().unwrap(); { use std::io::{Seek, Write}; + let metadata = std::fs::metadata(&*table.path)?; + let offset = std::cmp::min(100, metadata.len().saturating_sub(10)); let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?; - f.seek(std::io::SeekFrom::Start(100))?; + f.seek(std::io::SeekFrom::Start(offset))?; f.write_all(b"CORRUPT")?; f.sync_all()?; }

Also applies to: 124-135

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/verify_integrity.rs` around lines 53 - 62, The corruption injection assumes the SST file is at least 100 bytes; update the mutation in the test (the block that opens table.path after calling tree.current_version() and version.iter_tables()) to first stat the file and verify its size, and if the file is smaller than the hardcoded offset write point, compute a safe relative offset (e.g., max(0, file_size.saturating_sub(1).saturating_sub(corrupt_len)) or file_size / 2) and use that offset instead; ensure you check and handle errors from metadata retrieval and use that computed offset when seeking/writing so the corruption never silently fails on small test files.

When L0 accumulates multiple overlapping runs below the L0→L1 threshold, consolidate them within L0 to reduce point-read latency. - Merge all L0 runs into a single run when run_count > 1 and table_count < l0_threshold - Guard against infinite loops: falls through to L0→L1 when table_count reaches threshold - Skips if L0 is already busy with another compaction Closes fjall-rs#163

- Set canonical_level to 0 so L0 table writer settings are used instead of L1 policies (block sizes, compression, filters) - Assert post-compaction table count is 1 - Verify L1 remains empty after intra-L0 compaction - Use distinct values per flush and verify latest version

When intra-L0 compaction merges back into L0 (dest_level == 0), append the merged run instead of prepending it. This ensures any concurrently flushed (newer) runs remain at the front of L0 and are searched first during point reads, preventing stale reads.

Use explicit with_l0_threshold(4) instead of relying on Strategy::default() to avoid test brittleness if defaults change.

Add #[expect(clippy::expect_used)] for level(N).expect() calls that are guaranteed to succeed (level count is exactly 7, asserted above). Add #[expect(clippy::cast_possible_truncation)] for lmax_index as u8 cast (level count fits in u8).

The strategy was missing the actual intra-L0 compaction trigger. Add merge path: when L0 has multiple overlapping runs below the L0→L1 threshold, merge them into a single run within L0. Gate on !is_disjoint() to avoid unnecessary rewrites of non-overlapping runs that don't degrade point-read latency.

- Add note explaining why table_count==1 assertion is valid for small data - Clarify that dest_level==0 only occurs for intra-L0 compaction (memtable flushes use rotate_memtable, not upgrade_version)

- Add length validation after zstd decompression in from_reader, from_file, and blob reader (parity with lz4 path) - Add corruption tests for block and blob zstd decompression - Document compressed_buf initialization safety

Mirror the existing lz4 NOTE annotations in the zstd decompress paths so both codecs share the same validation roadmap.

…evels

- Document why Zstd uses i32 (upstream API match, not newtype) - Document why Error::other (upstream error style, not InvalidInput) - Document why Option<Vec<u8>> (upstream lz4 pattern, not Cow)

…docs

…ing test - Remove !is_disjoint() from intra-L0 condition (implied by run_count > 1) - Add test verifying newer L0 runs are searched before merged (older) run

…mpression feat: add zstd compression support

…nto feat/#163-intra-l0-compaction

polaz · 2026-03-16T13:48:57Z

Sorry about the noise here — the branch accumulated unrelated changes from our fork. Opened a clean replacement rebased on upstream main: #276

polaz added 10 commits March 14, 2026 11:33

fix(verify): use u64 instead of private BlobFileId in public API

ddf8e9d

BlobFileId is defined in the private vlog module and not re-exported, so using it in the public IntegrityError enum would leak a private type to external consumers.

feat(verify): implement std::error::Error for IntegrityError

221ea21

Enables error chain integration (anyhow, thiserror, ? propagation). IoError variant returns the underlying std::io::Error via source().

refactor(verify): add #[must_use] to verify_integrity return value

19f4b66

docs(verify): document public visibility of enum variant fields

48824b9

docs(verify): annotate non-obvious hasher and import choices

dc5406b

refactor(verify): use Xxh3Default for consistency, simplify assertion

cd948eb

- Switch stream_checksum to Xxh3Default matching ChecksummedWriter - Use path reference comparison instead of deref in test assertion - Remove redundant doc comment on enum field visibility

refactor(verify): use Xxh3Default::new() matching ChecksummedWriter

52f4d68

Copilot AI review requested due to automatic review settings March 14, 2026 11:54

Copilot started reviewing on behalf of polaz March 14, 2026 11:54 View session

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread src/compaction/leveled/test.rs

Comment thread src/compaction/leveled/test.rs Outdated

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread src/compaction/leveled/mod.rs Outdated

Comment thread src/compaction/leveled/test.rs Outdated

polaz added 2 commits March 14, 2026 14:03

refactor(contains_prefix): accurate doc wording and test corrections

453e729

- Doc describes convenience and error propagation, not false optimization claim - Test asserts "ab" matches "abc:*" keys at visible seqno - Add BlobTree test covering delegated index-only path

refactor(blob_tree): accurate contains_prefix override note

c25e693

Delegate to index tree avoids BlobGuard construction overhead, not value-log reads (key() never resolves blob indirections).

polaz requested a review from Copilot March 14, 2026 12:16

Copilot started reviewing on behalf of polaz March 14, 2026 12:16 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread src/compaction/leveled/mod.rs Outdated

polaz requested a review from Copilot March 14, 2026 12:27

Copilot started reviewing on behalf of polaz March 14, 2026 12:27 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread src/compaction/leveled/mod.rs Outdated

polaz added 2 commits March 14, 2026 14:38

feat: add multi_get() for batch point reads

273a801

- Add multi_get to AbstractTree trait with single-snapshot semantics - Implement for Tree and BlobTree with one lock acquisition for all keys - Add tests: all-existing, mixed, empty, snapshot isolation, tombstones, disk Closes fjall-rs#96

polaz requested a review from Copilot March 14, 2026 12:44

Copilot started reviewing on behalf of polaz March 14, 2026 12:44 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread src/compaction/leveled/test.rs Outdated

docs(test): clarify seqno snapshot visibility in test comment

c52ec80

polaz requested a review from Copilot March 14, 2026 12:57

polaz added 3 commits March 16, 2026 02:13

refactor(block): use owned Vec for compressed output

c04b5bf

Store compressed bytes in `compressed_buf: Option<Vec<u8>>` instead of taking a reference to the return value of compress(). Feature-gate the variable behind `#[cfg(any(feature = "lz4", feature = "zstd"))]`.

fix(compression): add missing Errors doc, remove unused mut on compre…

690c541

…ssed_buf

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

polaz added 8 commits March 16, 2026 02:17

test: pin l0_threshold in intra-L0 test

24d1d72

Use explicit with_l0_threshold(4) instead of relying on Strategy::default() to avoid test brittleness if defaults change.

docs: clarify intra-L0 invariants in code comments

21790e8

- Add note explaining why table_count==1 assertion is valid for small data - Clarify that dest_level==0 only occurs for intra-L0 compaction (memtable flushes use rotate_memtable, not upgrade_version)

fix(zstd): validate decompressed length, add corruption tests

66fa877

- Add length validation after zstd decompression in from_reader, from_file, and blob reader (parity with lz4 path) - Add corruption tests for block and blob zstd decompression - Document compressed_buf initialization safety

polaz force-pushed the feat/#163-intra-l0-compaction branch from 507702e to 21790e8 Compare March 16, 2026 00:23

polaz added 13 commits March 16, 2026 02:39

docs(zstd): add size-cap validation cross-references to PR #7

ab23c75

Mirror the existing lz4 NOTE annotations in the zstd decompress paths so both codecs share the same validation roadmap.

test(compression): add decode-path validation test for invalid zstd l…

b4c4b24

…evels

fix(compaction): remove unfulfilled cast_possible_truncation expect

8a7ebae

docs(zstd): clarify design decisions in code for reviewer context

d9dfc92

- Document why Zstd uses i32 (upstream API match, not newtype) - Document why Error::other (upstream error style, not InvalidInput) - Document why Option<Vec<u8>> (upstream lz4 pattern, not Cow)

docs(version): clarify with_merge vs with_new_l0_run flush path

1f02937

docs(ci): document Rust definite initialization rule

b1d45ac

test(compaction): pin target_size in intra-L0 test

c040cc7

fix(compression): add debug_assert for zstd level, clarify invariant …

d4f5413

…docs

refactor(compaction): drop redundant is_disjoint check, add run order…

8fd261b

…ing test - Remove !is_disjoint() from intra-L0 condition (implied by run_count > 1) - Add test verifying newer L0 runs are searched before merged (older) run

Merge pull request #1 from structured-world/feat/fjall-rs#254-zstd-co…

25ffae7

…mpression feat: add zstd compression support

Merge branch 'main' into feat/#163-intra-l0-compaction

4b0a44d

fix(compaction): use MAX_SEQNO instead of u64::MAX in test reads

26a53f5

Merge remote-tracking branch 'origin/feat/#163-intra-l0-compaction' i…

f9e7ae8

…nto feat/#163-intra-l0-compaction

polaz mentioned this pull request Mar 16, 2026

feat: add intra-L0 compaction for overlapping runs #276

Open

polaz closed this Mar 16, 2026

polaz deleted the feat/#163-intra-l0-compaction branch March 16, 2026 17:31

Uh oh!

Conversation

polaz commented Mar 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

polaz commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polaz commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading