feat: add intra-L0 compaction for overlapping runs#260
Conversation
Adds a public verify module with verify_integrity() that scans all SST and blob files in a tree, computing XXHash-3 128-bit checksums and comparing them against the checksums stored in the version manifest. This enables detection of silent bit-rot, partial writes, and other on-disk corruption without reading individual blocks. The function returns an IntegrityReport with: - sst_files_checked / blob_files_checked counters - Per-file errors (SstFileCorrupted, BlobFileCorrupted, IoError) - is_ok() / files_checked() convenience methods Tests cover: - Clean tree verification (standard + blob tree) - SST corruption detection - Blob file corruption detection - Multiple SST files Closes fjall-rs#187
- Compute XXH3 via BufReader instead of loading entire files into memory (prevents OOM on large SST/blob files) - Store std::io::Error directly in IoError variant instead of String - Use BlobFileId type alias instead of raw u64 - Add #[non_exhaustive] to IntegrityError for forward compatibility - Fix rustdoc: remove incorrect # Errors section - Add test for missing file (IoError path)
BlobFileId is defined in the private vlog module and not re-exported, so using it in the public IntegrityError enum would leak a private type to external consumers.
Enables error chain integration (anyhow, thiserror, ? propagation). IoError variant returns the underlying std::io::Error via source().
- Switch stream_checksum to Xxh3Default matching ChecksummedWriter - Use path reference comparison instead of deref in test assertion - Remove redundant doc comment on enum field visibility
Add contains_prefix() to AbstractTree trait that checks if any key with the given prefix exists, stopping at the first match instead of materializing a full iterator. - Default implementation on AbstractTree uses prefix().next() - BlobTree overrides to delegate to index tree, avoiding value log reads - MVCC-correct: respects seqno visibility and tombstones Closes fjall-rs#138
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds intra-L0 compaction to merge overlapping runs, refactors sequence number generation to a trait-based system with clamped MAX_SEQNO boundaries, implements seqno-aware data block seeking, introduces multi_get and contains_prefix query APIs, adds integrity verification for SST/blob files, and establishes CI/CD workflows and code review guidelines. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/compaction/leveled/test.rs`:
- Around line 76-81: Add an assertion that verifies the table count after
intra-L0 compaction to ensure runs were merged into the expected number of
tables; immediately after the existing assert_eq!(1, tree.l0_run_count()), add
an assertion like assert_eq!(1, tree.table_count()) (or the project's equivalent
method that returns total table count) so the test checks both l0_run_count()
and the final table count consistent with leveled_l0_reached_limit.
- Around line 83-88: The test currently verifies that data is readable after
intra-L0 compaction but doesn't assert that files remained in L0; update the
test (the block using tree.get and the loop over i in 0..3u8) to also assert
that no tables were promoted to L1 or higher by inspecting the tree's level
metadata (e.g., check that level 1 and above have zero tables/entries or that
tree.levels[1].is_empty() / equivalent API returns true). Use the tree's public
API or metadata accessor (the same test's tree variable) to assert levels > 0
are empty immediately after compaction to explicitly verify intra-L0 behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: aef431da-cfb3-4ee9-978e-8b00204942b2
📒 Files selected for processing (2)
src/compaction/leveled/mod.rssrc/compaction/leveled/test.rs
There was a problem hiding this comment.
Pull request overview
Adds an intra-L0 compaction step to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run when L0 is below its compaction trigger threshold.
Changes:
- Add intra-L0 “merge all L0 tables into one run” compaction choice when
l0.run_count > 1andl0.table_count < l0_threshold. - Add a unit test covering overlapping flushes and verifying L0 run consolidation and read availability.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/compaction/leveled/mod.rs |
Adds intra-L0 compaction selection logic before the existing scoring/L0→L1 path. |
src/compaction/leveled/test.rs |
Adds leveled_intra_l0_compaction test that flushes overlapping memtables and validates run consolidation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Doc describes convenience and error propagation, not false optimization claim - Test asserts "ab" matches "abc:*" keys at visible seqno - Add BlobTree test covering delegated index-only path
Delegate to index tree avoids BlobGuard construction overhead, not value-log reads (key() never resolves blob indirections).
There was a problem hiding this comment.
Pull request overview
Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification caused by multiple overlapping L0 runs during write bursts, while preserving the existing L0→L1 trigger behavior at the L0 table threshold.
Changes:
- Introduce an intra-L0 merge choice when L0 has multiple runs but remains below the L0 table-count threshold.
- Add a unit test that flushes overlapping memtables, triggers compaction, and verifies L0 run consolidation plus data correctness.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/compaction/leveled/mod.rs | Adds intra-L0 compaction selection logic (merge L0 tables back into L0 when below threshold). |
| src/compaction/leveled/test.rs | Adds leveled_intra_l0_compaction test to validate run consolidation and read correctness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR adds an “intra-L0” compaction path to reduce point-read latency during write bursts by merging multiple overlapping L0 runs into a single sorted run that stays in L0, while preserving the existing L0→L1 compaction trigger at the configured threshold.
Changes:
- Extend the leveled compaction strategy to choose an intra-L0 merge when L0 has multiple runs and is below the L0 table threshold.
- Add a unit test that flushes overlapping memtables, runs compaction, and asserts L0 run consolidation + read correctness.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/compaction/leveled/mod.rs |
Adds a new compaction choice that merges all L0 tables into dest_level: 0 when multiple L0 runs exist but L0 is still below threshold. |
src/compaction/leveled/test.rs |
Adds leveled_intra_l0_compaction to validate that overlapping L0 runs are consolidated within L0 and data remains readable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Add multi_get to AbstractTree trait with single-snapshot semantics - Implement for Tree and BlobTree with one lock acquisition for all keys - Add tests: all-existing, mixed, empty, snapshot isolation, tombstones, disk Closes fjall-rs#96
Exploit internal key ordering (user_key ASC, seqno DESC) to include seqno in the binary search predicate. This skips entire restart intervals containing only versions newer than the target snapshot, reducing O(versions) linear scan to O(restart_interval) for keys with many MVCC versions. Closes fjall-rs#237
There was a problem hiding this comment.
Pull request overview
Adds an intra-L0 compaction step to the leveled compaction strategy to reduce point-read amplification during write bursts by consolidating multiple overlapping L0 runs into a single run while staying in L0.
Changes:
- Add an intra-L0 merge choice in the leveled compaction strategy when
L0.run_count > 1andL0.table_count < l0_threshold. - Adjust version merge behavior for
dest_level == 0to preserve L0 “newest-first” run ordering in the presence of concurrent flushes. - Add a unit test covering intra-L0 compaction behavior and validating read correctness and that data remains in L0.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/compaction/leveled/mod.rs |
Introduces intra-L0 compaction selection logic in the leveled strategy. |
src/version/mod.rs |
Ensures merged output targeting L0 is appended to maintain correct run recency ordering. |
src/compaction/leveled/test.rs |
Adds a test verifying intra-L0 consolidation and read correctness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Store compressed bytes in `compressed_buf: Option<Vec<u8>>` instead of taking a reference to the return value of compress(). Feature-gate the variable behind `#[cfg(any(feature = "lz4", feature = "zstd"))]`.
- encode_into must remain infallible since encode_into_vec() uses expect() internally, assuming encoding cannot fail - level validation stays in CompressionType::zstd() constructor and Decode::decode_from for untrusted on-disk data - update test to validate through the constructor instead of encode_into
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/table/filter/mod.rs (1)
99-116: 🧹 Nitpick | 🔵 TrivialConsider adding a test for the
n == 0edge case.The new early return at lines 46-48 handles the zero-item case, but there's no test coverage for it. A simple test would document the expected behavior:
📝 Suggested test
+ #[test] + fn bloom_estimated_size_zero_items() { + let bpk_policy = BloomConstructionPolicy::BitsPerKey(10.0); + assert_eq!(bpk_policy.estimated_filter_size(0), 0); + + let fpr_policy = BloomConstructionPolicy::FalsePositiveRate(0.01); + assert_eq!(fpr_policy.estimated_filter_size(0), 0); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/filter/mod.rs` around lines 99 - 116, Add a unit test that covers the n == 0 edge case by calling BloomConstructionPolicy::BitsPerKey(...) and BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the estimated_filter_size(0) method and assert the returned size equals 0 (this verifies the early-return added around estimated_filter_size), placing the test in the same mod/tests area alongside bloom_estimated_size_bpk and bloom_estimated_size_fpr.src/version/super_version.rs (1)
115-147:⚠️ Potential issue | 🟠 MajorValidate custom-generator seqnos before persisting a new version.
Configcan now accept arbitrarySequenceNumberGeneratorimplementations, butnext_version.seqno = seqnois still written unchecked. Clampingvisible_seqnoonly protects the watermark; a buggy generator can still persist a reserved-MSB seqno and leave snapshot selection inconsistent. Guardseqno <= MAX_SEQNObeforepersist_version.🛡️ Proposed guard
) -> crate::Result<()> { + assert!( + seqno <= MAX_SEQNO, + "Sequence number must not use the reserved MSB" + ); let mut next_version = f(&self.latest_version())?; next_version.seqno = seqno; log::trace!("Next version seqno={}", next_version.seqno);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/version/super_version.rs` around lines 115 - 147, The seqno provided to upgrade_version_with_seqno is not validated before being persisted, so add a guard in upgrade_version_with_seqno (before calling persist_version and append_version) that checks the provided SeqNo (seqno) is <= MAX_SEQNO; if the check fails return an appropriate crate::Result error (e.g., Err with a descriptive error) instead of persisting, and only proceed to call persist_version(tree_path, &next_version.version) and self.append_version(next_version) after the validation; also keep the existing visible_seqno.fetch_max logic but perform it only after the seqno passes validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/copilot-instructions.md:
- Around line 35-40: Update the fenced code block that contains "<type>(scope):
<description>" to include a language identifier (e.g., use "text") so markdown
linting passes and formatting is consistent; locate the triple-backtick block in
.github/copilot-instructions.md and change the opening fence from ``` to ```text
(or remove language if you prefer plain fence) while keeping the inner lines "-
Detail 1" and "- Detail 2" unchanged.
In `@tests/custom_seqno_generator.rs`:
- Around line 61-104: The tests currently only call seqno.next() /
visible_seqno.fetch_max() directly so they don't verify the Config wiring;
modify each test (custom_generator_via_builder and
custom_generator_via_new_with_generators) to force an internal version-upgrade
path that consumes the injected generators (e.g., call
tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting)
and then assert the injected OffsetGenerator instances (the seqno.clone() and
visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or
Config::new_with_generators) have advanced as expected (or that tree operations
using visible_seqno reflect the new values), thereby proving the config
generators are actually used.
In `@tests/verify_integrity.rs`:
- Around line 53-62: The corruption injection assumes the SST file is at least
100 bytes; update the mutation in the test (the block that opens table.path
after calling tree.current_version() and version.iter_tables()) to first stat
the file and verify its size, and if the file is smaller than the hardcoded
offset write point, compute a safe relative offset (e.g., max(0,
file_size.saturating_sub(1).saturating_sub(corrupt_len)) or file_size / 2) and
use that offset instead; ensure you check and handle errors from metadata
retrieval and use that computed offset when seeking/writing so the corruption
never silently fails on small test files.
---
Outside diff comments:
In `@src/table/filter/mod.rs`:
- Around line 99-116: Add a unit test that covers the n == 0 edge case by
calling BloomConstructionPolicy::BitsPerKey(...) and
BloomConstructionPolicy::FalsePositiveRate(...). For each policy call the
estimated_filter_size(0) method and assert the returned size equals 0 (this
verifies the early-return added around estimated_filter_size), placing the test
in the same mod/tests area alongside bloom_estimated_size_bpk and
bloom_estimated_size_fpr.
In `@src/version/super_version.rs`:
- Around line 115-147: The seqno provided to upgrade_version_with_seqno is not
validated before being persisted, so add a guard in upgrade_version_with_seqno
(before calling persist_version and append_version) that checks the provided
SeqNo (seqno) is <= MAX_SEQNO; if the check fails return an appropriate
crate::Result error (e.g., Err with a descriptive error) instead of persisting,
and only proceed to call persist_version(tree_path, &next_version.version) and
self.append_version(next_version) after the validation; also keep the existing
visible_seqno.fetch_max logic but perform it only after the seqno passes
validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c9622a6e-755b-4d40-903a-6f21909f6144
📒 Files selected for processing (34)
.github/copilot-instructions.md.github/instructions/code-review.instructions.mdCargo.tomlclippy.tomlsrc/abstract_tree.rssrc/blob_tree/mod.rssrc/compaction/leveled/mod.rssrc/compaction/leveled/test.rssrc/compaction/worker.rssrc/config/mod.rssrc/lib.rssrc/manifest.rssrc/seqno.rssrc/slice/slice_bytes/mod.rssrc/table/block/mod.rssrc/table/data_block/iter.rssrc/table/data_block/iter_test.rssrc/table/data_block/mod.rssrc/table/filter/mod.rssrc/table/iter.rssrc/table/mod.rssrc/table/util.rssrc/tree/mod.rssrc/verify.rssrc/version/mod.rssrc/version/run.rssrc/version/super_version.rssrc/vlog/blob_file/reader.rssrc/vlog/mod.rstests/custom_seqno_generator.rstests/ingestion_seqno.rstests/multi_get.rstests/tree_contains_prefix.rstests/verify_integrity.rs
| ``` | ||
| <type>(scope): <description> | ||
|
|
||
| - Detail 1 | ||
| - Detail 2 | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language identifier to fenced code block.
The commit message format code block should have a language identifier for consistency and to satisfy markdown linting. Consider using text or leaving it empty with triple backticks alone if no syntax highlighting is needed.
📝 Suggested fix
-```
+```text
<type>(scope): <description>
- Detail 1
- Detail 2</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/copilot-instructions.md around lines 35 - 40, Update the fenced code
block that contains "<type>(scope): <description>" to include a language
identifier (e.g., use "text") so markdown linting passes and formatting is
consistent; locate the triple-backtick block in .github/copilot-instructions.md
and change the opening fence from ``` to ```text (or remove language if you
prefer plain fence) while keeping the inner lines "- Detail 1" and "- Detail 2"
unchanged.
| #[test] | ||
| fn custom_generator_via_builder() -> lsm_tree::Result<()> { | ||
| let path = lsm_tree::get_tmp_folder(); | ||
|
|
||
| let seqno = OffsetGenerator::new(1000); | ||
| let visible_seqno = OffsetGenerator::new(1000); | ||
|
|
||
| let tree = Config::new( | ||
| &path, | ||
| SequenceNumberCounter::default(), | ||
| SequenceNumberCounter::default(), | ||
| ) | ||
| .seqno_generator(seqno.clone()) | ||
| .visible_seqno_generator(visible_seqno.clone()) | ||
| .open()?; | ||
|
|
||
| let s = seqno.next(); | ||
| assert_eq!(s, 1000); | ||
| tree.insert(b"key", b"value", s); | ||
| visible_seqno.fetch_max(s + 1); | ||
|
|
||
| assert_eq!(tree.len(visible_seqno.get(), None)?, 1); | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[test] | ||
| fn custom_generator_via_new_with_generators() -> lsm_tree::Result<()> { | ||
| let path = lsm_tree::get_tmp_folder(); | ||
|
|
||
| let seqno = OffsetGenerator::new(5000); | ||
| let visible_seqno = OffsetGenerator::new(5000); | ||
|
|
||
| let tree = Config::new_with_generators(&path, seqno.clone(), visible_seqno.clone()).open()?; | ||
|
|
||
| let s = seqno.next(); | ||
| assert_eq!(s, 5000); | ||
| tree.insert(b"hello", b"world", s); | ||
| visible_seqno.fetch_max(s + 1); | ||
|
|
||
| let val = tree.get(b"hello", visible_seqno.get())?.unwrap(); | ||
| assert_eq!(&*val, b"world"); | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
These tests don't actually prove the config wiring works.
Both cases drive seqno and visible_seqno directly, and neither one triggers a tree path that consumes self.config.seqno / self.config.visible_seqno. If seqno_generator(...), visible_seqno_generator(...), or new_with_generators(...) were accidentally ignored, these assertions would still pass. Please force an internal version-upgrade path such as clear(), flush_active_memtable(), or compaction and assert that the injected generators advance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/custom_seqno_generator.rs` around lines 61 - 104, The tests currently
only call seqno.next() / visible_seqno.fetch_max() directly so they don't verify
the Config wiring; modify each test (custom_generator_via_builder and
custom_generator_via_new_with_generators) to force an internal version-upgrade
path that consumes the injected generators (e.g., call
tree.flush_active_memtable(), tree.clear(), or tree.compact() after inserting)
and then assert the injected OffsetGenerator instances (the seqno.clone() and
visible_seqno.clone() passed via seqno_generator/visible_seqno_generator or
Config::new_with_generators) have advanced as expected (or that tree operations
using visible_seqno reflect the new values), thereby proving the config
generators are actually used.
| // Corrupt a byte in the SST file | ||
| let version = tree.current_version(); | ||
| let table = version.iter_tables().next().unwrap(); | ||
| { | ||
| use std::io::{Seek, Write}; | ||
| let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?; | ||
| f.seek(std::io::SeekFrom::Start(100))?; | ||
| f.write_all(b"CORRUPT")?; | ||
| f.sync_all()?; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Corruption injection approach is reasonable but assumes minimum file size.
The tests corrupt bytes at offsets 100 (lines 59, 132) and 50 (lines 278, 324). This works for the test data (26 keys with values), but could fail silently if future test modifications reduce file sizes below these offsets.
Consider adding a defensive check or using a relative offset:
💡 Suggested defensive approach
// Corrupt a byte in the SST file
let version = tree.current_version();
let table = version.iter_tables().next().unwrap();
{
use std::io::{Seek, Write};
+ let metadata = std::fs::metadata(&*table.path)?;
+ let offset = std::cmp::min(100, metadata.len().saturating_sub(10));
let mut f = std::fs::OpenOptions::new().write(true).open(&*table.path)?;
- f.seek(std::io::SeekFrom::Start(100))?;
+ f.seek(std::io::SeekFrom::Start(offset))?;
f.write_all(b"CORRUPT")?;
f.sync_all()?;
}Also applies to: 124-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/verify_integrity.rs` around lines 53 - 62, The corruption injection
assumes the SST file is at least 100 bytes; update the mutation in the test (the
block that opens table.path after calling tree.current_version() and
version.iter_tables()) to first stat the file and verify its size, and if the
file is smaller than the hardcoded offset write point, compute a safe relative
offset (e.g., max(0, file_size.saturating_sub(1).saturating_sub(corrupt_len)) or
file_size / 2) and use that offset instead; ensure you check and handle errors
from metadata retrieval and use that computed offset when seeking/writing so the
corruption never silently fails on small test files.
When L0 accumulates multiple overlapping runs below the L0→L1 threshold, consolidate them within L0 to reduce point-read latency. - Merge all L0 runs into a single run when run_count > 1 and table_count < l0_threshold - Guard against infinite loops: falls through to L0→L1 when table_count reaches threshold - Skips if L0 is already busy with another compaction Closes fjall-rs#163
- Set canonical_level to 0 so L0 table writer settings are used instead of L1 policies (block sizes, compression, filters) - Assert post-compaction table count is 1 - Verify L1 remains empty after intra-L0 compaction - Use distinct values per flush and verify latest version
When intra-L0 compaction merges back into L0 (dest_level == 0), append the merged run instead of prepending it. This ensures any concurrently flushed (newer) runs remain at the front of L0 and are searched first during point reads, preventing stale reads.
Use explicit with_l0_threshold(4) instead of relying on Strategy::default() to avoid test brittleness if defaults change.
Add #[expect(clippy::expect_used)] for level(N).expect() calls that are guaranteed to succeed (level count is exactly 7, asserted above). Add #[expect(clippy::cast_possible_truncation)] for lmax_index as u8 cast (level count fits in u8).
The strategy was missing the actual intra-L0 compaction trigger. Add merge path: when L0 has multiple overlapping runs below the L0→L1 threshold, merge them into a single run within L0. Gate on !is_disjoint() to avoid unnecessary rewrites of non-overlapping runs that don't degrade point-read latency.
- Add note explaining why table_count==1 assertion is valid for small data - Clarify that dest_level==0 only occurs for intra-L0 compaction (memtable flushes use rotate_memtable, not upgrade_version)
- Add length validation after zstd decompression in from_reader, from_file, and blob reader (parity with lz4 path) - Add corruption tests for block and blob zstd decompression - Document compressed_buf initialization safety
507702e to
21790e8
Compare
Mirror the existing lz4 NOTE annotations in the zstd decompress paths so both codecs share the same validation roadmap.
- Document why Zstd uses i32 (upstream API match, not newtype) - Document why Error::other (upstream error style, not InvalidInput) - Document why Option<Vec<u8>> (upstream lz4 pattern, not Cow)
…ing test - Remove !is_disjoint() from intra-L0 condition (implied by run_count > 1) - Add test verifying newer L0 runs are searched before merged (older) run
…mpression feat: add zstd compression support
…nto feat/#163-intra-l0-compaction
|
Sorry about the noise here — the branch accumulated unrelated changes from our fork. Opened a clean replacement rebased on upstream main: #276 |
Summary
run_count > 1andtable_count < l0_threshold, all L0 runs are merged into a single sorted run within L0 (dest_level: 0)Test plan
leveled_intra_l0_compaction: flush 3 overlapping memtables, verify multiple L0 runs, compact, verify single L0 run, verify all data readable and L1 remains emptycargo test --all-features— all existing tests pass (no regressions)Closes #163
Summary by CodeRabbit
New Features
multi_getAPI.contains_prefixAPI.Bug Fixes
Documentation
Chores
Tests