Skip to content

feat: add intra-L0 compaction for overlapping runs#260

Closed
polaz wants to merge 146 commits into
fjall-rs:mainfrom
structured-world:feat/#163-intra-l0-compaction
Closed

feat: add intra-L0 compaction for overlapping runs#260
polaz wants to merge 146 commits into
fjall-rs:mainfrom
structured-world:feat/#163-intra-l0-compaction

Conversation

@polaz
Copy link
Copy Markdown

@polaz polaz commented Mar 14, 2026

Summary

  • During write bursts, L0 accumulates multiple overlapping runs from memtable flushes, causing point-read latency spikes (each run must be checked)
  • Adds intra-L0 compaction: when L0 has run_count > 1 and table_count < l0_threshold, all L0 runs are merged into a single sorted run within L0 (dest_level: 0)
  • Falls through to existing L0→L1 compaction when table_count reaches threshold, preventing infinite loops
  • ~26 LOC in strategy, ~44 LOC test

Test plan

  • New test leveled_intra_l0_compaction: flush 3 overlapping memtables, verify multiple L0 runs, compact, verify single L0 run, verify all data readable and L1 remains empty
  • cargo test --all-features — all existing tests pass (no regressions)

Closes #163

Summary by CodeRabbit

  • New Features

    • Intra-L0 compaction to merge overlapping runs below configured threshold.
    • Batch key lookup via multi_get API.
    • Key prefix existence checking via contains_prefix API.
    • Integrity verification for SST and blob files.
    • Customizable sequence number generators.
  • Bug Fixes

    • Safer LZ4 decompression with validation of uncompressed length.
    • Sequence number overflow prevention.
  • Documentation

    • Added code review guidelines and CI instructions.
    • Updated README with maintained fork details.
  • Chores

    • CI/CD automation: Dependabot, GitHub Actions workflows, release automation.
  • Tests

    • Added comprehensive tests for new features and edge cases.

polaz added 10 commits March 14, 2026 11:33
Adds a public verify module with verify_integrity() that scans all SST
and blob files in a tree, computing XXHash-3 128-bit checksums and
comparing them against the checksums stored in the version manifest.

This enables detection of silent bit-rot, partial writes, and other
on-disk corruption without reading individual blocks.

The function returns an IntegrityReport with:
- sst_files_checked / blob_files_checked counters
- Per-file errors (SstFileCorrupted, BlobFileCorrupted, IoError)
- is_ok() / files_checked() convenience methods

Tests cover:
- Clean tree verification (standard + blob tree)
- SST corruption detection
- Blob file corruption detection
- Multiple SST files

Closes fjall-rs#187
- Compute XXH3 via BufReader instead of loading entire files
  into memory (prevents OOM on large SST/blob files)
- Store std::io::Error directly in IoError variant instead of String
- Use BlobFileId type alias instead of raw u64
- Add #[non_exhaustive] to IntegrityError for forward compatibility
- Fix rustdoc: remove incorrect # Errors section
- Add test for missing file (IoError path)
BlobFileId is defined in the private vlog module and not re-exported,
so using it in the public IntegrityError enum would leak a private
type to external consumers.
Enables error chain integration (anyhow, thiserror, ? propagation).
IoError variant returns the underlying std::io::Error via source().
- Switch stream_checksum to Xxh3Default matching ChecksummedWriter
- Use path reference comparison instead of deref in test assertion
- Remove redundant doc comment on enum field visibility
Add contains_prefix() to AbstractTree trait that checks if any key
with the given prefix exists, stopping at the first match instead
of materializing a full iterator.

- Default implementation on AbstractTree uses prefix().next()
- BlobTree overrides to delegate to index tree, avoiding value log reads
- MVCC-correct: respects seqno visibility and tombstones

Closes fjall-rs#138
Copilot AI review requested due to automatic review settings March 14, 2026 11:54
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds intra-L0 compaction to merge overlapping runs, refactors sequence number generation to a trait-based system with clamped MAX_SEQNO boundaries, implements seqno-aware data block seeking, introduces multi_get and contains_prefix query APIs, adds integrity verification for SST/blob files, and establishes CI/CD workflows and code review guidelines.

Changes

Cohort / File(s) Summary
Intra-L0 Compaction
src/compaction/leveled/mod.rs, src/compaction/leveled/test.rs, src/version/mod.rs
Adds intra-L0 compaction path merging overlapping L0 runs when count is below threshold. Modified Run insertion logic in with_merge to append new runs to L0 instead of prepending. Includes comprehensive test validating consolidation, table count, and L1 behavior.
Sequence Number Generation Refactor
src/seqno.rs, src/config/mod.rs, src/compaction/worker.rs, src/version/super_version.rs
Introduces SequenceNumberGenerator trait with MAX_SEQNO constant (0x7FFF_FFFF_FFFF_FFFF), SharedSequenceNumberGenerator type alias, and refactors SequenceNumberCounter to implement trait. Updates Config and Options to use new generator types. Implements saturating_add clamping for visible_seqno advancement.
SeqNo-Aware Data Block Operations
src/table/data_block/iter.rs, src/table/data_block/iter_test.rs, src/table/data_block/mod.rs, src/table/iter.rs
Updates data block iterator seek methods to accept seqno parameter for snapshot-aware positioning. Adds seek_to_key_seqno for seqno-aware binary search. Modifies point_read to use seqno-aware seeking. Adds extensive tests validating seqno behavior across restart intervals and mixed-key scenarios.
Query APIs: Multi-Get and Contains-Prefix
src/abstract_tree.rs, src/blob_tree/mod.rs, src/tree/mod.rs, tests/multi_get.rs, tests/tree_contains_prefix.rs
Adds multi_get and contains_prefix trait methods to AbstractTree. Implements resolve_key helper and method overrides in BlobTree. Includes batch resolution with shared SuperVersion. Comprehensive test suites validating snapshot isolation, tombstones, blob indirection, and MVCC semantics.
Integrity Verification System
src/verify.rs, src/table/block/mod.rs, src/vlog/blob_file/reader.rs, src/vlog/mod.rs, tests/verify_integrity.rs
New module with IntegrityError enum and IntegrityReport struct for validating SST and blob checksums. Replaces unsafe LZ4 buffer construction with safe Vec-based approach including decompression validation. Adds streaming checksum computation. Extensive tests covering clean trees, corruption detection, and error handling.
Block and Storage Improvements
src/table/block/mod.rs, src/vlog/blob_file/reader.rs, src/slice/slice_bytes/mod.rs
Replaces unsafe buffer construction for LZ4 decompression with Vec-based safe approach and runtime validation of uncompressed length. Adds test for corrupted length detection. Updates clippy annotations for explicit expect usage. Changes to usize-based arithmetic for blob value reading.
Minor Configuration and Lint Updates
src/lib.rs, src/manifest.rs, src/table/filter/mod.rs, src/table/util.rs, src/table/mod.rs, src/version/run.rs, clippy.toml
Exports new seqno types and verify module. Adds dead_code and expect attributes with justifications. Updates clippy lints from warn to expect with detailed reasons. Adds allowed-duplicate-crates config for hashbrown. Changes highest_seqno calculation to offset by global_seqno.
Test Suites
tests/custom_seqno_generator.rs, tests/ingestion_seqno.rs
Introduces OffsetGenerator implementing SequenceNumberGenerator for testing custom generators. Validates seqno progression through Config builder and new_with_generators constructor. Tests ingestion persisted seqno tracking across operations.
CI/CD Infrastructure
.github/workflows/coordinode-ci.yml, .github/workflows/coordinode-release.yml, .github/workflows/upstream-monitor.yml, .github/dependabot.yml, .release-plz.toml
Adds comprehensive GitHub Actions workflows for lint/test on push/PR with matrix testing (stable and 1.90.0), code coverage with nightly and llvm-cov, automated release PRs via release-plz, and upstream synchronization monitor. Configures Dependabot for cargo and GitHub Actions updates with conventional commits.
Documentation and Guidelines
README.md, .github/copilot-instructions.md, .github/instructions/code-review.instructions.md
Updates CI badges and adds Maintained fork note with Structured World Foundation context. Adds comprehensive code review guidelines covering scope, Rust standards, testing practices, commit format, feature flags, and architecture overview.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

epic

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes significant out-of-scope changes: new API methods (multi_get, contains_prefix), sequence number generator refactoring, integrity verification, GitHub workflows, and documentation. Separate the intra-L0 compaction feature (#163) from infrastructure changes (workflows, generators, multi_get API). Create distinct PRs for API additions and build tooling.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add intra-L0 compaction for overlapping runs' accurately and concisely summarizes the main change in the PR.
Linked Issues check ✅ Passed The PR implements the core requirement from #163: adding intra-L0 compaction to merge overlapping L0 runs when below threshold, with proper test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compaction/leveled/test.rs`:
- Around line 76-81: Add an assertion that verifies the table count after
intra-L0 compaction to ensure runs were merged into the expected number of
tables; immediately after the existing assert_eq!(1, tree.l0_run_count()), add
an assertion like assert_eq!(1, tree.table_count()) (or the project's equivalent
method that returns total table count) so the test checks both l0_run_count()
and the final table count consistent with leveled_l0_reached_limit.
- Around line 83-88: The test currently verifies that data is readable after
intra-L0 compaction but doesn't assert that files remained in L0; update the
test (the block using tree.get and the loop over i in 0..3u8) to also assert
that no tables were promoted to L1 or higher by inspecting the tree's level
metadata (e.g., check that level 1 and above have zero tables/entries or that
tree.levels[1].is_empty() / equivalent API returns true). Use the tree's public
API or metadata accessor (the same test's tree variable) to assert levels > 0
are empty immediately after compaction to explicitly verify intra-L0 behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aef431da-cfb3-4ee9-978e-8b00204942b2

📥 Commits

Reviewing files that changed from the base of the PR and between e11c791 and 74b5521.

📒 Files selected for processing (2)
  • src/compaction/leveled/mod.rs
  • src/compaction/leveled/test.rs

Comment thread src/compaction/leveled/test.rs
Comment thread src/compaction/leveled/test.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an intra-L0 compaction step to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run when L0 is below its compaction trigger threshold.

Changes:

  • Add intra-L0 “merge all L0 tables into one run” compaction choice when l0.run_count > 1 and l0.table_count < l0_threshold.
  • Add a unit test covering overlapping flushes and verifying L0 run consolidation and read availability.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/compaction/leveled/mod.rs Adds intra-L0 compaction selection logic before the existing scoring/L0→L1 path.
src/compaction/leveled/test.rs Adds leveled_intra_l0_compaction test that flushes overlapping memtables and validates run consolidation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/compaction/leveled/mod.rs Outdated
Comment thread src/compaction/leveled/test.rs Outdated
polaz added 2 commits March 14, 2026 14:03
- Doc describes convenience and error propagation, not false optimization claim
- Test asserts "ab" matches "abc:*" keys at visible seqno
- Add BlobTree test covering delegated index-only path
Delegate to index tree avoids BlobGuard construction overhead,
not value-log reads (key() never resolves blob indirections).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification caused by multiple overlapping L0 runs during write bursts, while preserving the existing L0→L1 trigger behavior at the L0 table threshold.

Changes:

  • Introduce an intra-L0 merge choice when L0 has multiple runs but remains below the L0 table-count threshold.
  • Add a unit test that flushes overlapping memtables, triggers compaction, and verifies L0 run consolidation plus data correctness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/compaction/leveled/mod.rs Adds intra-L0 compaction selection logic (merge L0 tables back into L0 when below threshold).
src/compaction/leveled/test.rs Adds leveled_intra_l0_compaction test to validate run consolidation and read correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/compaction/leveled/mod.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an “intra-L0” compaction path to reduce point-read latency during write bursts by merging multiple overlapping L0 runs into a single sorted run that stays in L0, while preserving the existing L0→L1 compaction trigger at the configured threshold.

Changes:

  • Extend the leveled compaction strategy to choose an intra-L0 merge when L0 has multiple runs and is below the L0 table threshold.
  • Add a unit test that flushes overlapping memtables, runs compaction, and asserts L0 run consolidation + read correctness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/compaction/leveled/mod.rs Adds a new compaction choice that merges all L0 tables into dest_level: 0 when multiple L0 runs exist but L0 is still below threshold.
src/compaction/leveled/test.rs Adds leveled_intra_l0_compaction to validate that overlapping L0 runs are consolidated within L0 and data remains readable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/compaction/leveled/mod.rs Outdated
polaz added 2 commits March 14, 2026 14:38
- Add multi_get to AbstractTree trait with single-snapshot semantics
- Implement for Tree and BlobTree with one lock acquisition for all keys
- Add tests: all-existing, mixed, empty, snapshot isolation, tombstones, disk

Closes fjall-rs#96
Exploit internal key ordering (user_key ASC, seqno DESC) to include
seqno in the binary search predicate. This skips entire restart
intervals containing only versions newer than the target snapshot,
reducing O(versions) linear scan to O(restart_interval) for keys
with many MVCC versions.

Closes fjall-rs#237
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run while staying in L0.

Changes:

  • Add an intra-L0 merge choice in the leveled compaction strategy when L0.run_count > 1 and L0.table_count < l0_threshold.
  • Adjust version merge behavior for dest_level == 0 to preserve L0 “newest-first” run ordering in the presence of concurrent flushes.
  • Add a unit test covering intra-L0 compaction behavior and validating read correctness and that data remains in L0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/compaction/leveled/mod.rs Introduces intra-L0 compaction selection logic in the leveled strategy.
src/version/mod.rs Ensures merged output targeting L0 is appended to maintain correct run recency ordering.
src/compaction/leveled/test.rs Adds a test verifying intra-L0 consolidation and read correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/compaction/leveled/test.rs Outdated
@polaz polaz requested a review from Copilot March 14, 2026 12:57
polaz added 3 commits March 16, 2026 02:13
Store compressed bytes in `compressed_buf: Option<Vec<u8>>` instead
of taking a reference to the return value of compress(). Feature-gate
the variable behind `#[cfg(any(feature = "lz4", feature = "zstd"))]`.
- encode_into must remain infallible since encode_into_vec() uses
  expect() internally, assuming encoding cannot fail
- level validation stays in CompressionType::zstd() constructor and
  Decode::decode_from for untrusted on-disk data
- update test to validate through the constructor instead of encode_into
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/table/filter/mod.rs (1)

99-116: 🧹 Nitpick | 🔵 Trivial

Consider adding a test for the n == 0 edge case.

The new early return at lines 46-48 handles the zero-item case, but there's no test coverage for it. A simple test would document the expected behavior:

📝 Suggested test
+    #[test]
+    fn bloom_estimated_size_zero_items() {
+        let bpk_policy = BloomConstructionPolicy::BitsPerKey(10.0);
+        assert_eq!(bpk_policy.estimated_filter_size(0), 0);
+
+        let fpr_policy = BloomConstructionPolicy::FalsePositiveRate(0.01);
+        assert_eq!(fpr_policy.estimated_filter_size(0), 0);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/table/filter/mod.rs` around lines 99 - 116, Add a unit test that covers
the n == 0 edge case by calling BloomConstructionPolicy::BitsPerKey(...) and
BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the
estimated_filter_size(0) method and assert the returned size equals 0 (this
verifies the early-return added around estimated_filter_size), placing the test
in the same mod/tests area alongside bloom_estimated_size_bpk and
bloom_estimated_size_fpr.
src/version/super_version.rs (1)

115-147: ⚠️ Potential issue | 🟠 Major

Validate custom-generator seqnos before persisting a new version.

Config can now accept arbitrary SequenceNumberGenerator implementations, but next_version.seqno = seqno is still written unchecked. Clamping visible_seqno only protects the watermark; a buggy generator can still persist a reserved-MSB seqno and leave snapshot selection inconsistent. Guard seqno <= MAX_SEQNO before persist_version.

🛡️ Proposed guard
     ) -> crate::Result<()> {
+        assert!(
+            seqno <= MAX_SEQNO,
+            "Sequence number must not use the reserved MSB"
+        );
         let mut next_version = f(&self.latest_version())?;
         next_version.seqno = seqno;
         log::trace!("Next version seqno={}", next_version.seqno);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/version/super_version.rs` around lines 115 - 147, The seqno provided to
upgrade_version_with_seqno is not validated before being persisted, so add a
guard in upgrade_version_with_seqno (before calling persist_version and
append_version) that checks the provided SeqNo (seqno) is <= MAX_SEQNO; if the
check fails return an appropriate crate::Result error (e.g., Err with a
descriptive error) instead of persisting, and only proceed to call
persist_version(tree_path, &next_version.version) and
self.append_version(next_version) after the validation; also keep the existing
visible_seqno.fetch_max logic but perform it only after the seqno passes
validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/copilot-instructions.md:
- Around line 35-40: Update the fenced code block that contains "<type>(scope):
<description>" to include a language identifier (e.g., use "text") so markdown
linting passes and formatting is consistent; locate the triple-backtick block in
.github/copilot-instructions.md and change the opening fence from ``` to ```text
(or remove language if you prefer plain fence) while keeping the inner lines "-
Detail 1" and "- Detail 2" unchanged.

In `@tests/custom_seqno_generator.rs`:
- Around line 61-104: The tests currently only call seqno.next() /
visible_seqno.fetch_max() directly so they don't verify the Config wiring;
modify each test (custom_generator_via_builder and
custom_generator_via_new_with_generators) to force an internal version-upgrade
path that consumes the injected generators (e.g., call
tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting)
and then assert the injected OffsetGenerator instances (the seqno.clone() and
visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or
Config::new_with_generators) have advanced as expected (or that tree operations
using visible_seqno reflect the new values), thereby proving the config
generators are actually used.

In `@tests/verify_integrity.rs`:
- Around line 53-62: The corruption injection assumes the SST file is at least
100 bytes; update the mutation in the test (the block that opens table.path
after calling tree.current_version() and version.iter_tables()) to first stat
the file and verify its size, and if the file is smaller than the hardcoded
offset write point, compute a safe relative offset (e.g., max(0,
file_size.saturating_sub(1).saturating_sub(corrupt_len)) or file_size / 2) and
use that offset instead; ensure you check and handle errors from metadata
retrieval and use that computed offset when seeking/writing so the corruption
never silently fails on small test files.

---

Outside diff comments:
In `@src/table/filter/mod.rs`:
- Around line 99-116: Add a unit test that covers the n == 0 edge case by
calling BloomConstructionPolicy::BitsPerKey(...) and
BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the
estimated_filter_size(0) method and assert the returned size equals 0 (this
verifies the early-return added around estimated_filter_size), placing the test
in the same mod/tests area alongside bloom_estimated_size_bpk and
bloom_estimated_size_fpr.

In `@src/version/super_version.rs`:
- Around line 115-147: The seqno provided to upgrade_version_with_seqno is not
validated before being persisted, so add a guard in upgrade_version_with_seqno
(before calling persist_version and append_version) that checks the provided
SeqNo (seqno) is <= MAX_SEQNO; if the check fails return an appropriate
crate::Result error (e.g., Err with a descriptive error) instead of persisting,
and only proceed to call persist_version(tree_path, &next_version.version) and
self.append_version(next_version) after the validation; also keep the existing
visible_seqno.fetch_max logic but perform it only after the seqno passes
validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c9622a6e-755b-4d40-903a-6f21909f6144

📥 Commits

Reviewing files that changed from the base of the PR and between 3de8408 and 507702e.

📒 Files selected for processing (34)
  • .github/copilot-instructions.md
  • .github/instructions/code-review.instructions.md
  • Cargo.toml
  • clippy.toml
  • src/abstract_tree.rs
  • src/blob_tree/mod.rs
  • src/compaction/leveled/mod.rs
  • src/compaction/leveled/test.rs
  • src/compaction/worker.rs
  • src/config/mod.rs
  • src/lib.rs
  • src/manifest.rs
  • src/seqno.rs
  • src/slice/slice_bytes/mod.rs
  • src/table/block/mod.rs
  • src/table/data_block/iter.rs
  • src/table/data_block/iter_test.rs
  • src/table/data_block/mod.rs
  • src/table/filter/mod.rs
  • src/table/iter.rs
  • src/table/mod.rs
  • src/table/util.rs
  • src/tree/mod.rs
  • src/verify.rs
  • src/version/mod.rs
  • src/version/run.rs
  • src/version/super_version.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/mod.rs
  • tests/custom_seqno_generator.rs
  • tests/ingestion_seqno.rs
  • tests/multi_get.rs
  • tests/tree_contains_prefix.rs
  • tests/verify_integrity.rs

Comment on lines +35 to +40
```
<type>(scope): <description>

- Detail 1
- Detail 2
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language identifier to fenced code block.

The commit message format code block should have a language identifier for consistency and to satisfy markdown linting. Consider using text or leaving it empty with triple backticks alone if no syntax highlighting is needed.

📝 Suggested fix
-```
+```text
 <type>(scope): <description>
 
 - Detail 1
 - Detail 2
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/copilot-instructions.md around lines 35 - 40, Update the fenced code
block that contains "<type>(scope): <description>" to include a language
identifier (e.g., use "text") so markdown linting passes and formatting is
consistent; locate the triple-backtick block in .github/copilot-instructions.md
and change the opening fence from ``` to ```text (or remove language if you
prefer plain fence) while keeping the inner lines "- Detail 1" and "- Detail 2"
unchanged.

Comment on lines +61 to +104
#[test]
fn custom_generator_via_builder() -> lsm_tree::Result<()> {
let path = lsm_tree::get_tmp_folder();

let seqno = OffsetGenerator::new(1000);
let visible_seqno = OffsetGenerator::new(1000);

let tree = Config::new(
&path,
SequenceNumberCounter::default(),
SequenceNumberCounter::default(),
)
.seqno_generator(seqno.clone())
.visible_seqno_generator(visible_seqno.clone())
.open()?;

let s = seqno.next();
assert_eq!(s, 1000);
tree.insert(b"key", b"value", s);
visible_seqno.fetch_max(s + 1);

assert_eq!(tree.len(visible_seqno.get(), None)?, 1);

Ok(())
}

#[test]
fn custom_generator_via_new_with_generators() -> lsm_tree::Result<()> {
let path = lsm_tree::get_tmp_folder();

let seqno = OffsetGenerator::new(5000);
let visible_seqno = OffsetGenerator::new(5000);

let tree = Config::new_with_generators(&path, seqno.clone(), visible_seqno.clone()).open()?;

let s = seqno.next();
assert_eq!(s, 5000);
tree.insert(b"hello", b"world", s);
visible_seqno.fetch_max(s + 1);

let val = tree.get(b"hello", visible_seqno.get())?.unwrap();
assert_eq!(&*val, b"world");

Ok(())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

These tests don't actually prove the config wiring works.

Both cases drive seqno and visible_seqno directly, and neither one triggers a tree path that consumes self.config.seqno / self.config.visible_seqno. If seqno_generator(...), visible_seqno_generator(...), or new_with_generators(...) were accidentally ignored, these assertions would still pass. Please force an internal version-upgrade path such as clear(), flush_active_memtable(), or compaction and assert that the injected generators advance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/custom_seqno_generator.rs` around lines 61 - 104, The tests currently
only call seqno.next() / visible_seqno.fetch_max() directly so they don't verify
the Config wiring; modify each test (custom_generator_via_builder and
custom_generator_via_new_with_generators) to force an internal version-upgrade
path that consumes the injected generators (e.g., call
tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting)
and then assert the injected OffsetGenerator instances (the seqno.clone() and
visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or
Config::new_with_generators) have advanced as expected (or that tree operations
using visible_seqno reflect the new values), thereby proving the config
generators are actually used.

Comment thread tests/verify_integrity.rs
Comment on lines +53 to +62
// Corrupt a byte in the SST file
let version = tree.current_version();
let table = version.iter_tables().next().unwrap();
{
use std::io::{Seek, Write};
let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?;
f.seek(std::io::SeekFrom::Start(100))?;
f.write_all(b"CORRUPT")?;
f.sync_all()?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Corruption injection approach is reasonable but assumes minimum file size.

The tests corrupt bytes at offsets 100 (lines 59, 132) and 50 (lines 278, 324). This works for the test data (26 keys with values), but could fail silently if future test modifications reduce file sizes below these offsets.

Consider adding a defensive check or using a relative offset:

💡 Suggested defensive approach
 // Corrupt a byte in the SST file
 let version = tree.current_version();
 let table = version.iter_tables().next().unwrap();
 {
     use std::io::{Seek, Write};
+    let metadata = std::fs::metadata(&*table.path)?;
+    let offset = std::cmp::min(100, metadata.len().saturating_sub(10));
     let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?;
-    f.seek(std::io::SeekFrom::Start(100))?;
+    f.seek(std::io::SeekFrom::Start(offset))?;
     f.write_all(b"CORRUPT")?;
     f.sync_all()?;
 }

Also applies to: 124-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/verify_integrity.rs` around lines 53 - 62, The corruption injection
assumes the SST file is at least 100 bytes; update the mutation in the test (the
block that opens table.path after calling tree.current_version() and
version.iter_tables()) to first stat the file and verify its size, and if the
file is smaller than the hardcoded offset write point, compute a safe relative
offset (e.g., max(0, file_size.saturating_sub(1).saturating_sub(corrupt_len)) or
file_size / 2) and use that offset instead; ensure you check and handle errors
from metadata retrieval and use that computed offset when seeking/writing so the
corruption never silently fails on small test files.

polaz added 8 commits March 16, 2026 02:17
When L0 accumulates multiple overlapping runs below the L0→L1
threshold, consolidate them within L0 to reduce point-read latency.

- Merge all L0 runs into a single run when run_count > 1
  and table_count < l0_threshold
- Guard against infinite loops: falls through to L0→L1
  when table_count reaches threshold
- Skips if L0 is already busy with another compaction

Closes fjall-rs#163
- Set canonical_level to 0 so L0 table writer settings are used
  instead of L1 policies (block sizes, compression, filters)
- Assert post-compaction table count is 1
- Verify L1 remains empty after intra-L0 compaction
- Use distinct values per flush and verify latest version
When intra-L0 compaction merges back into L0 (dest_level == 0),
append the merged run instead of prepending it. This ensures
any concurrently flushed (newer) runs remain at the front of L0
and are searched first during point reads, preventing stale reads.
Use explicit with_l0_threshold(4) instead of relying on
Strategy::default() to avoid test brittleness if defaults change.
Add #[expect(clippy::expect_used)] for level(N).expect() calls that
are guaranteed to succeed (level count is exactly 7, asserted above).

Add #[expect(clippy::cast_possible_truncation)] for lmax_index as u8
cast (level count fits in u8).
The strategy was missing the actual intra-L0 compaction trigger.
Add merge path: when L0 has multiple overlapping runs below the
L0→L1 threshold, merge them into a single run within L0.

Gate on !is_disjoint() to avoid unnecessary rewrites of non-overlapping
runs that don't degrade point-read latency.
- Add note explaining why table_count==1 assertion is valid for small data
- Clarify that dest_level==0 only occurs for intra-L0 compaction
  (memtable flushes use rotate_memtable, not upgrade_version)
- Add length validation after zstd decompression in from_reader,
  from_file, and blob reader (parity with lz4 path)
- Add corruption tests for block and blob zstd decompression
- Document compressed_buf initialization safety
@polaz polaz force-pushed the feat/#163-intra-l0-compaction branch from 507702e to 21790e8 Compare March 16, 2026 00:23
polaz added 13 commits March 16, 2026 02:39
Mirror the existing lz4 NOTE annotations in the zstd decompress
paths so both codecs share the same validation roadmap.
- Document why Zstd uses i32 (upstream API match, not newtype)
- Document why Error::other (upstream error style, not InvalidInput)
- Document why Option<Vec<u8>> (upstream lz4 pattern, not Cow)
…ing test

- Remove !is_disjoint() from intra-L0 condition (implied by run_count > 1)
- Add test verifying newer L0 runs are searched before merged (older) run
…mpression

feat: add zstd compression support
@polaz
Copy link
Copy Markdown
Author

polaz commented Mar 16, 2026

Sorry about the noise here — the branch accumulated unrelated changes from our fork. Opened a clean replacement rebased on upstream main: #276

@polaz polaz closed this Mar 16, 2026
@polaz polaz deleted the feat/#163-intra-l0-compaction branch March 16, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intra-L0 compaction

2 participants