Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReplaces bloom with vendored ribbon-filter and BuRR multilayer filters; migrates filter hashing to ChangesBuRR Filter Integration
Estimated code review effort 🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 10c5c23 | Previous: 6f0d811 | Ratio |
|---|---|---|---|
fillseq |
845704.925800426 ops/sec (normalized) |
1212148.0486146263 ops/sec (normalized) |
1.43 |
fillrandom |
409621.2250521961 ops/sec (normalized) |
656262.2987714246 ops/sec (normalized) |
1.60 |
readrandom |
210988.42737450165 ops/sec (normalized) |
287384.2141945515 ops/sec (normalized) |
1.36 |
seekrandom |
162348.55941495934 ops/sec (normalized) |
210244.43188624468 ops/sec (normalized) |
1.30 |
prefixscan |
99822.94229941237 ops/sec (normalized) |
118177.26381944011 ops/sec (normalized) |
1.18 |
overwrite |
413216.61157293693 ops/sec (normalized) |
683457.353426151 ops/sec (normalized) |
1.65 |
mergerandom |
343645.52916514216 ops/sec (normalized) |
490769.1783137203 ops/sec (normalized) |
1.43 |
readwhilewriting |
183875.06406959324 ops/sec (normalized) |
264726.89902541734 ops/sec (normalized) |
1.44 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
There was a problem hiding this comment.
Pull request overview
This PR replaces the LSM-table approximate-membership filter implementation from the legacy “standard bloom” filter to a new BuRR (Bumped Ribbon Retrieval) filter, backed by a vendored Ribbon (GF(2) banded solver) primitive. It also standardizes call sites on a stable crate::hash::hash64 entry point and rewires filter writers/readers and benchmarks accordingly.
Changes:
- Replace standard bloom + bit-array filter modules with a new
ribbonmodule and BuRR wire format + builder/probe APIs. - Rewire table filter writers/readers and tree/table call sites to use
crate::hash::hash64+ BuRRcontains_hashpaths. - Vendor Ribbon implementation + add BuRR tests/benches and make
hashmodule public.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tree/mod.rs | Switch key hashing from bloom builder hash to crate::hash::hash64. |
| src/blob_tree/mod.rs | Switch key hashing from bloom builder hash to crate::hash::hash64. |
| src/table/writer/filter/partitioned.rs | Build partitioned filters via build_burr_filter_bytes; switch hashing to hash64. |
| src/table/writer/filter/full.rs | Build full filter via build_burr_filter_bytes; switch hashing to hash64. |
| src/table/tests.rs | Update tests to use hash64 instead of bloom builder hashing. |
| src/table/mod.rs | Update debug assertions to validate key_hash == hash64(key). |
| src/table/filter/mod.rs | Replace bloom builder plumbing with BuRR params + build_burr_filter_bytes; update size estimates and tests. |
| src/table/filter/block.rs | Switch filter block probing to BurrFilterReader. |
| src/table/filter/standard_bloom/mod.rs | Removed legacy standard bloom reader implementation. |
| src/table/filter/standard_bloom/builder.rs | Removed legacy standard bloom builder implementation. |
| src/table/filter/bit_array/mod.rs | Removed legacy bit array module. |
| src/table/filter/bit_array/builder.rs | Removed legacy bit array builder. |
| src/table/filter/bit_array/reader.rs | Removed legacy bit array reader. |
| src/table/filter/ribbon/mod.rs | Add vendored Ribbon module entry + exports + (currently) broad lint suppressions. |
| src/table/filter/ribbon/params.rs | Add Ribbon parameter validation/types. |
| src/table/filter/ribbon/error.rs | Add Ribbon build/param/serde repr error types. |
| src/table/filter/ribbon/hashing.rs | Add hashing/equation generation utilities (incl. splitmix64). |
| src/table/filter/ribbon/builder.rs | Add Ribbon builder (including “verbatim seed” build variants for BuRR). |
| src/table/filter/ribbon/filter.rs | Add Ribbon filter type + probe path + (optional) serde repr. |
| src/table/filter/ribbon/tests.rs | Add Ribbon unit/property/statistical tests. |
| src/table/filter/ribbon/burr/mod.rs | Add BuRR module entry + public exports. |
| src/table/filter/ribbon/burr/params.rs | Add BuRR parameter derivation from FPR/BPK. |
| src/table/filter/ribbon/burr/error.rs | Add BuRR build/construction error types. |
| src/table/filter/ribbon/burr/threshold.rs | Add per-block threshold computation + partitioning helpers. |
| src/table/filter/ribbon/burr/builder.rs | Add multi-layer BuRR builder (including hash-based build entrypoint). |
| src/table/filter/ribbon/burr/filter.rs | Add BuRR filter + wire-format reader wrapper. |
| src/table/filter/ribbon/burr/wire.rs | Add BuRR wire-format encode/decode + hash-based probe against decoded bytes. |
| src/table/filter/ribbon/burr/tests.rs | Add BuRR smoke/correctness + basic wire tests. |
| src/table/filter/ribbon/_vendored/LICENSE-MIT | Add vendored Ribbon license text. |
| src/table/filter/ribbon/_vendored/LICENSE-APACHE | Add vendored Ribbon license text. |
| src/prefix.rs | Switch prefix hashing to crate::hash::hash64. |
| src/lib.rs | Make hash module public. |
| src/hash.rs | Add docs + #[must_use] to stable hash64/hash128. |
| Cargo.toml | Add bitvec dependency + ribbon-serde feature. |
| benches/index_block.rs | Switch key hashing to hash64. |
| benches/bloom.rs | Rewrite benchmark to BuRR construction/probe (file name kept). |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/table/writer/filter/full.rs (1)
94-110:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
start("filter")call after the empty-filter check to avoid leaving the writer section unfinalized.In
full.rs,file_writer.start("filter")is called before building filter bytes. Iffilter_bytes.is_empty(), the method returns early without writing any data to the section, leaving the writer in an inconsistent state. This violates thesfa::WriterAPI contract:start()must always be followed by a write operation.The safe fix is to defer
start()until after confirming that filter data will be written:Corrected ordering
- file_writer.start("filter")?; - let n = self.bloom_hash_buffer.len(); log::trace!( @@ -105,6 +104,7 @@ impl<W: std::io::Write + std::io::Seek> FilterWriter<W> for FullFilterWriter { log::trace!("BuRR policy produced empty filter — skipping block write"); return Ok(0); } + file_writer.start("filter")?; log::trace!(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/table/writer/filter/full.rs` around lines 94 - 110, The call to file_writer.start("filter") is invoked before building the filter and may be left open when build_burr_filter_bytes returns an empty buffer; move the file_writer.start("filter") invocation to after calling build_burr_filter_bytes and after confirming !filter_bytes.is_empty() so start() is only called when you will write the section. Specifically, keep the call to build_burr_filter_bytes(self.bloom_policy, &self.bloom_hash_buffer) and the empty-check (filter_bytes.is_empty()), and only call file_writer.start("filter") immediately before writing the filter bytes (and then follow with the write/finalize calls that the sfa::Writer API expects).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benches/bloom.rs`:
- Line 90: The benchmark is reconstructing BurrFilterReader each iteration
(BurrFilterReader::new(&filter_bytes)), so contains_hash measurements include
decode overhead; move the reader construction out of b.iter to create a
long‑lived reader instance (e.g., create reader once from filter_bytes before
calling b.iter) and then call reader.contains_hash(...) inside b.iter to measure
steady‑state probe latency; if you also want to measure decode+probe, add a
separate benchmark that constructs BurrFilterReader::new(&filter_bytes) inside
its own b.iter; update the contains_hash benchmark names/comments to reflect
probe‑only vs decode+probe and ensure you use Criterion’s latency reporting
(p99/p999) settings per benches/** guidelines.
In `@src/table/filter/mod.rs`:
- Around line 63-68: Replace the #[allow(...)] clippy suppressions with
#[expect(..., reason = "...")] so unnecessary suppressions are surfaced: change
the attribute that currently reads #[allow(clippy::cast_precision_loss,
clippy::cast_possible_truncation, clippy::cast_sign_loss, reason = "estimation,
precision loss is acceptable")] to #[expect(clippy::cast_precision_loss,
clippy::cast_possible_truncation, clippy::cast_sign_loss, reason = "estimation,
precision loss is acceptable")] and make the same replacement for the second
suppression site noted in the diff (the other #[allow(...)] occurrence around
the estimation/cast logic).
In `@src/table/filter/ribbon/burr/filter.rs`:
- Around line 127-176: contains_hash (and contains_in) allocate fresh Vec<u64>
for fingerprint and acc on every probe which breaks the allocation-free
contract; change these to use stack-sized arrays when params.r <= 64 or reuse
buffers from the existing Scratch allocator created by new_scratch.
Specifically, eliminate the local Vec<u64> allocations for fingerprint and acc
in contains_hash (and the same pattern in contains_in), compute stride from
self.params.r as before, and obtain mutable buffers from Scratch (or use a fixed
[u64;1] when stride==1) instead of calling vec![...]; keep usage with
super::super::hashing::xor_words and the equation/fingerprint flow unchanged but
operate on the borrowed/reused buffers so no heap allocation occurs per probe.
In `@src/table/filter/ribbon/burr/params.rs`:
- Around line 120-131: The function layer_m can divide by zero because the
public field b may be 0; add an explicit guard at the start of the division
section in fn layer_m to validate b (usize::from(self.b)) is > 0 and fail fast
with a clear message (e.g., panic!("invalid params: b must be > 0")) or convert
the API to return a Result instead; update the block that computes b and uses
raw.div_ceil(b) to first check if b == 0 and handle accordingly so no
divide-by-zero panic occurs (refer to layer_m and the public field b for the
exact locations to change).
- Around line 70-71: Replace the three #[allow(clippy::cast_possible_truncation,
clippy::cast_sign_loss)] attributes with
#[expect(clippy::cast_possible_truncation, reason = "...")] and
#[expect(clippy::cast_sign_loss, reason = "...")] on the specific casts: for the
r conversion (r_f -> r as u8) add an expect explaining r_f is validated to [1.0,
64.0] before cast; for the bpk conversion (bpk -> bpk as u8) add an expect
explaining bpk is validated/clamped to [1.0, 64.0] before cast; and for the
layer_input_keys -> f64 conversion add an expect indicating values were
validated earlier and the resulting f64 is bounded by self.b, so the cast is
safe; keep the existing validation/clamping logic and only replace allow with
expect and concise reasons referencing r_f, bpk, and layer_input_keys/self.b.
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 173-250: The decode function is trusting layer metadata (r, w, b,
num_blocks, z_byte_len) and only uses debug_assert! for the z payload size;
replace that with explicit runtime validation in decode: validate r/w/b are
within acceptable non-zero ranges (e.g., b != 0), ensure num_blocks and
z_byte_len are sane (num_blocks <= m and z_byte_len == m * stride_words * 8),
and verify the remaining bytes length before slicing; if any check fails return
Err(crate::Error::InvalidHeader("BurrFilter layer ...")); update the checks
around pos/HEADER_LEN/LAYER_HEADER_LEN and the place building LayerView so
corrupt metadata cannot produce downstream panics in Params::new, is_bumped,
read_row or silent skips in contains_hash.
In `@src/table/filter/ribbon/params.rs`:
- Line 5: The build fails when --all-features because the ribbon-serde feature
enables #[cfg_attr(feature = "ribbon-serde", derive(serde::Serialize,
serde::Deserialize))] (seen in params.rs and filter.rs) but serde isn't declared
or wired to that feature; add serde as an optional dependency in Cargo.toml
(e.g. serde = { version = "1.0", features = ["derive"], optional = true }) and
update the [features] section to include "ribbon-serde" =
["your-crate-name/serde"] (or simply ["serde"] if in the same package) so the
serde::Serialize/Deserialize derives compile when ribbon-serde is enabled.
In `@src/table/mod.rs`:
- Line 1092: Update the stale debug assertion message used in
bloom_may_contain_key to reference the actual hashing function
crate::hash::hash64 rather than BloomBuilder::get_hash(key); locate the
assertion string in src/table/mod.rs and change the message to something like
"bloom_may_contain_key: key_hash must be crate::hash::hash64(key)" (or
equivalent wording) so the log correctly reflects the function used to compute
key_hash.
---
Outside diff comments:
In `@src/table/writer/filter/full.rs`:
- Around line 94-110: The call to file_writer.start("filter") is invoked before
building the filter and may be left open when build_burr_filter_bytes returns an
empty buffer; move the file_writer.start("filter") invocation to after calling
build_burr_filter_bytes and after confirming !filter_bytes.is_empty() so start()
is only called when you will write the section. Specifically, keep the call to
build_burr_filter_bytes(self.bloom_policy, &self.bloom_hash_buffer) and the
empty-check (filter_bytes.is_empty()), and only call file_writer.start("filter")
immediately before writing the filter bytes (and then follow with the
write/finalize calls that the sfa::Writer API expects).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 37f91efc-58c4-49ab-abbe-9ea23faecc9c
📒 Files selected for processing (36)
Cargo.tomlbenches/bloom.rsbenches/index_block.rssrc/blob_tree/mod.rssrc/hash.rssrc/lib.rssrc/prefix.rssrc/table/filter/bit_array/builder.rssrc/table/filter/bit_array/mod.rssrc/table/filter/bit_array/reader.rssrc/table/filter/block.rssrc/table/filter/mod.rssrc/table/filter/ribbon/_vendored/LICENSE-APACHEsrc/table/filter/ribbon/_vendored/LICENSE-MITsrc/table/filter/ribbon/builder.rssrc/table/filter/ribbon/burr/builder.rssrc/table/filter/ribbon/burr/error.rssrc/table/filter/ribbon/burr/filter.rssrc/table/filter/ribbon/burr/mod.rssrc/table/filter/ribbon/burr/params.rssrc/table/filter/ribbon/burr/tests.rssrc/table/filter/ribbon/burr/threshold.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/error.rssrc/table/filter/ribbon/filter.rssrc/table/filter/ribbon/hashing.rssrc/table/filter/ribbon/mod.rssrc/table/filter/ribbon/params.rssrc/table/filter/ribbon/tests.rssrc/table/filter/standard_bloom/builder.rssrc/table/filter/standard_bloom/mod.rssrc/table/mod.rssrc/table/tests.rssrc/table/writer/filter/full.rssrc/table/writer/filter/partitioned.rssrc/tree/mod.rs
💤 Files with no reviewable changes (5)
- src/table/filter/bit_array/builder.rs
- src/table/filter/bit_array/mod.rs
- src/table/filter/bit_array/reader.rs
- src/table/filter/standard_bloom/mod.rs
- src/table/filter/standard_bloom/builder.rs
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/table/filter/mod.rs`:
- Around line 39-47: is_active() currently returns true for any 0<fpr<1 for the
FalsePositiveRate variant, which can disagree with BurrParams::with_fp_rate;
update the FalsePositiveRate branch in is_active() to mirror
BurrParams::with_fp_rate by computing r = ceil(-log2(fpr)) (or equivalently
checking fpr >= 2^-64) and only return true when r is in 1..=64 (i.e., fpr > 0.0
&& fpr < 1.0 && ceil(-log2(*fpr)) between 1 and 64) so that is_active() and
BurrParams::with_fp_rate agree (reference: is_active, FalsePositiveRate,
BurrParams::with_fp_rate).
In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 54-55: Replace the non-self-checking lint suppression on the float
cast with an expect annotation: for the site that computes fpr (the let fpr =
false_positives as f64 / probe_count as f64;), change
#[allow(clippy::cast_precision_loss)] to #[expect(clippy::cast_precision_loss,
reason = "casting counts to f64 for rate calculation in tests")]; do the same
for the other two occurrences flagged (the other casts around the tests that
compute rates/statistics) so the suppression is self-checking when the casts
become unnecessary.
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 85-96: Replace the newly added clippy suppressions in
src/table/filter/ribbon/burr/wire.rs with expect attributes: change the
#[allow(clippy::expect_used, reason = "...")] above the write_u8 sequence to
#[expect(clippy::expect_used, reason = "...")], and likewise replace the
#[allow(clippy::cast_possible_truncation, reason = "...")] that precedes let
num_layers_u8 = layers.len() as u8 with
#[expect(clippy::cast_possible_truncation, reason = "...")]; apply the same
pattern to the other two occurrences referenced (around the other blocks at the
later locations) so all three allow -> expect replacements keep the same reason
text.
In `@src/table/filter/ribbon/params.rs`:
- Around line 55-58: The builder-style method Params::with_seed currently
returns Self but lacks the #[must_use] attribute; add #[must_use] directly above
the pub fn with_seed signature so callers are warned if they discard the
returned updated Params, leaving the method otherwise unchanged (keep the
signature, body, and visibility the same).
In `@src/table/writer/filter/full.rs`:
- Around line 91-93: The early-empty check on bloom_hash_buffer currently logs
"not building filter" but then the function later returns Ok(1); change the
control flow so the "no hashes registered" path immediately returns Ok(0)
(matching the filter_bytes.is_empty() path) instead of falling through. Update
the branch that checks bloom_hash_buffer.is_empty() and the similar empty-filter
path around the later conditional (both places mentioned) to return Ok(0) when
no filter is written so the return value accurately reflects zero bytes/blocks
written.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7c86539a-552b-4b13-ade0-843000dfecc5
📒 Files selected for processing (12)
Cargo.tomlbenches/bloom.rssrc/table/filter/mod.rssrc/table/filter/ribbon/burr/filter.rssrc/table/filter/ribbon/burr/params.rssrc/table/filter/ribbon/burr/tests.rssrc/table/filter/ribbon/burr/threshold.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/mod.rssrc/table/filter/ribbon/params.rssrc/table/mod.rssrc/table/writer/filter/full.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/table/filter/ribbon/burr/wire.rs (1)
83-97: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winReplace new
#[allow(clippy::...)]suppressions with#[expect(..., reason = "...")].These are new suppressions in PR-added Rust code; please switch them to
#[expect]so stale suppressions surface automatically.Suggested diff
- #[allow(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] + #[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] { @@ - #[allow( + #[expect( clippy::cast_possible_truncation, reason = "max_layers fits u8 by construction" )] let num_layers_u8 = layers.len() as u8; @@ - #[allow(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] + #[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] { @@ - #[allow(clippy::expect_used, reason = "len pre-validated by header check")] + #[expect(clippy::expect_used, reason = "len pre-validated by header check")] let arr: [u8; 8] = chunk.try_into().expect("chunks_exact yields 8-byte slices");As per coding guidelines, "Prefer
#[expect(lint)]over#[allow(lint)]—#[expect]warns when suppression becomes unnecessary".Also applies to: 119-126, 253-255
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/table/filter/ribbon/burr/wire.rs` around lines 83 - 97, Replace the new clippy suppression attributes with expectation attributes: change the block-level #[allow(clippy::expect_used, reason = "...")] to #[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] and similarly change the cast suppression around num_layers_u8 from #[allow(clippy::cast_possible_truncation, reason = "max_layers fits u8 by construction")] to #[expect(clippy::cast_possible_truncation, reason = "max_layers fits u8 by construction")]; apply the same pattern to the other occurrences noted (around the u8 casts at lines ~119-126 and ~253-255) so the clippy suppressions for buf.write_u8 / num_layers_u8 / params seed writes become #[expect(..., reason = "...")] instead of #[allow(...)].
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benches/bloom.rs`:
- Around line 43-66: The probe benchmarks need percentile-oriented latency
reporting (p99/p999) rather than only mean throughput; update the probe
benchmark functions (notably the ones around burr_filter_construction and the
probe benches in the 68-127 and 129-165 ranges) to use a Criterion
BenchmarkGroup instead of plain c.bench_function, configure the group with an
appropriate sample_size and measurement_time so Criterion will collect latency
samples, and register the probe benchmark via group.bench_function or
group.bench_with_input so the collected samples can be used to generate
percentile stats (p99/p999) in Criterion’s report; keep the build bench
(burr_filter_construction and BurrBuilder::build_from_hashes) unchanged and
apply this BenchmarkGroup pattern to each probe path function.
- Around line 140-141: Replace the local suppression attribute on the expression
that computes r (the line containing "let r = (-fpr.log2()).ceil() as usize;")
so that the current #[allow(clippy::cast_possible_truncation,
clippy::cast_sign_loss)] becomes #[expect(clippy::cast_possible_truncation,
reason = "ceil() to usize is intentional and reviewed")] and
#[expect(clippy::cast_sign_loss, reason = "ceil() to usize is intentional and
reviewed")], i.e., use #[expect(..., reason = "...")] with the two specific lint
names instead of #[allow(...)] to ensure the suppression will warn if it becomes
unnecessary.
---
Duplicate comments:
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 83-97: Replace the new clippy suppression attributes with
expectation attributes: change the block-level #[allow(clippy::expect_used,
reason = "...")] to #[expect(clippy::expect_used, reason = "writing to a Vec<u8>
cannot fail")] and similarly change the cast suppression around num_layers_u8
from #[allow(clippy::cast_possible_truncation, reason = "max_layers fits u8 by
construction")] to #[expect(clippy::cast_possible_truncation, reason =
"max_layers fits u8 by construction")]; apply the same pattern to the other
occurrences noted (around the u8 casts at lines ~119-126 and ~253-255) so the
clippy suppressions for buf.write_u8 / num_layers_u8 / params seed writes become
#[expect(..., reason = "...")] instead of #[allow(...)].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b1e7c573-f490-4292-ba80-92f74e0554eb
📒 Files selected for processing (5)
benches/bloom.rssrc/table/filter/ribbon/burr/filter.rssrc/table/filter/ribbon/burr/threshold.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/hashing.rs
Addresses round-2 review feedback on PR #269. # Correctness fixes src/table/filter/mod.rs: - is_active now delegates to burr_params(1).is_some() — exact equivalence with 'would this policy produce a non-empty filter'. Closes the case where 0 < fpr < 2^-64 reported is_active() == true but burr_params returned None, causing the writer to buffer hashes then drop them at the spill. src/table/writer/filter/full.rs: - finish() now returns Ok(0) on the empty bloom_hash_buffer path (was Ok(1)). The reported block_count#filter now matches the number of filter sections actually written. Restructured the flow to early-return on empty buffer instead of an else block, so the Ok(1) tail only fires when a filter block was written. src/table/writer/filter/partitioned.rs: - spill_filter_partition now skips the Block::write_into when build_burr_filter_bytes returns an empty Vec. Persisting an empty filter block would cause BurrFilterReader::new to error at read time. The TLI entry is also skipped so the reader sees no filter for this range. # Lint-policy migrations src/table/filter/ribbon/burr/wire.rs — allow → expect on the encode/header expect blocks + the num_layers as u8 cast. src/table/filter/ribbon/burr/tests.rs — three cast_precision_loss suppressions: allow → expect with concise reasons. src/table/filter/ribbon/params.rs — Params::with_seed marked must_use (builder-style method). benches/bloom.rs — cast_possible_truncation + cast_sign_loss on the bench's local r computation switched to paired expect attributes. # Vendored allow propagation src/table/filter/ribbon/burr/mod.rs comment expanded: the parent ribbon::* allow list propagates into burr/, which means expect/unwrap/indexing-slicing can be written in BuRR code despite the crate-level deny. Re-denying here would require migrating ~30 internal sites in builder.rs / wire.rs / threshold.rs / filter.rs / tests.rs — tracked separately as a follow-up (issue #270). New BuRR code in THIS PR uses per-site expect with reason for new suppressions. # Verification nextest --lib table::filter: 67 tests pass full-target lint pass (-D warnings): clean
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 327-331: The current probe path silently substitutes [0;8] when
the slice or conversion fails (using unwrap_or), which can mask data corruption;
instead, change the code around z.get(row_byte..row_byte + 8).and_then(|s|
s.try_into().ok()) so that if it yields None you immediately return true (fail
closed) from the probe function, otherwise bind the resulting [u8;8] to arr and
continue with acc ^= u64::from_le_bytes(arr); ensure you return from the same
function that contains acc and row_byte (the probe path where acc ^=
u64::from_le_bytes(arr) is located) so the impossible path no longer falls back
to zeros.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ec0d18f1-9e2c-4bcc-a77f-7cf7234049fa
📒 Files selected for processing (9)
benches/bloom.rssrc/table/filter/mod.rssrc/table/filter/ribbon/burr/mod.rssrc/table/filter/ribbon/burr/tests.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/mod.rssrc/table/filter/ribbon/params.rssrc/table/writer/filter/full.rssrc/table/writer/filter/partitioned.rs
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benches/bloom.rs`:
- Around line 135-154: The Ribbon benchmark currently builds the filter with n =
1_000_000 but only supplies keys: Vec<u64> = (0..100_000).collect(), causing a
lower effective load than BuRR; update the benchmark so the keys vector length
matches the intended n (or otherwise pad/expand to n unique keys) before calling
Params::new and building the filter (affecting symbols keys, n, Params::new,
RibbonBuilder::new, builder.build, filter.new_scratch) so the Ribbon baseline is
sized and populated at the same effective load as BuRR.
In `@src/table/filter/mod.rs`:
- Around line 74-87: estimated_filter_size() can return a positive size for
policies that are inactive according to burr_params(n); change it to mirror
build behavior by returning 0 when burr_params(n) is None. Update
estimated_filter_size to call or check the same activation logic used by
burr_params(n) (or call burr_params(n) directly) and early-return 0 for inactive
policies, then compute r_bits and the size only when the policy is active
(preserving the existing clamp and byte conversion logic).
In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 169-176: Update the test burr_wire_rejects_bad_magic to start from
a valid serialized filter, then mutate only the magic bytes to trigger the
magic-check path; specifically, construct a bona fide wire payload (e.g., by
creating a valid Burr filter via the same helpers used elsewhere or by
serializing a known-good Burr using the code paths under test), then flip
bytes[0]/bytes[1] (the magic) and call BurrFilterReader::new(&bytes) to assert
it returns an Err; ensure you reference the BurrFilterReader::new constructor
and keep mutation limited to the magic bytes so other validation steps do not
fail first.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 53a20187-e398-45a8-b066-d533b57efe16
📒 Files selected for processing (11)
Cargo.tomlbenches/bloom.rssrc/table/filter/mod.rssrc/table/filter/ribbon/burr/mod.rssrc/table/filter/ribbon/burr/tests.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/mod.rssrc/table/filter/ribbon/params.rssrc/table/filter/ribbon/tests.rssrc/table/writer/filter/full.rssrc/table/writer/filter/partitioned.rs
Round-3 review fixes on PR #269. src/table/writer/filter/partitioned.rs: spill_filter_partition: empty filter_bytes now fails closed with Error::Unrecoverable instead of silently skipping the partition. Silently skipping caused false negatives at read time: keys in the skipped range would binary-search to a later partition's filter (which does not contain them), and Table::check_bloom would report 'definitely not present' on a live key. The path is unreachable in practice (is_active is checked upstream before keys are buffered) but defence-in-depth. tests/burr_filter_end_to_end.rs: Header comment used 'false negatives' wrong (false negatives apply to inserted keys, not unknown keys). Rewritten to describe the two invariants the test actually checks: (a) inserted keys must resolve via tree.get — no false negatives, (b) unknown keys must never return Some — false positives only ever cost a wasted index lookup. benches/bloom.rs: ribbon_filter_contains now pads its key set to n with random fillers (same shape as the BuRR bench above), so the Ribbon body is loaded to the same density. Without this, Ribbon was running at ~10% load vs BuRR's ~70-90% — non-equivalent probe-latency comparison. src/table/filter/mod.rs: estimated_filter_size now delegates to burr_params for the n check and pulls r directly from the constructed params. Estimate is 0 exactly when the builder would also produce empty bytes — keeps the memory accounting in sync with build behavior. Previously the estimator clamped raw inputs and could overestimate for inactive policies. src/table/filter/ribbon/burr/tests.rs: burr_wire_rejects_bad_magic now starts from a valid wire payload and flips the first magic byte. The previous test built a buffer of zeros — that buffer fails decode regardless of the magic check (e.g. on the version byte), so it did not actually prove the magic check triggers.
Wire-format break: filter blocks are no longer Bloom-encoded — V4 binaries cannot read V5 tables and vice versa. The filter block decoder rejects the wrong magic at open time. Manifest version bump makes the contract explicit: src/format_version.rs: +V5 variant. Doc-comment explains the wire-format break and the cross-version read incompatibility. src/manifest.rs / src/version/mod.rs: Writers emit V5; manifest deserialiser assert pins V5 expectation in the round-trip test. src/tree/mod.rs: Tree::open accepts only V5 manifests. V3/V4 manifests now error with InvalidVersion at recovery time rather than reaching the filter decoder with mismatched bytes. tests/tree_recovery_versions.rs: Renamed tree_writes_v4_manifest → tree_writes_v5_manifest, added tree_rejects_pre_v5_manifest asserting V4 → InvalidVersion(4), replaced the V3-safe-recovery test (V3 is no longer a safe fall-back under V5). # zstd-pure alias dropped The 'zstd-pure' feature flag was a deprecated alias for 'zstd' that shipped while both a C-backed and pure-Rust implementation were on the roadmap. Only one zstd backend now exists (via structured-zstd), so the alias serves no purpose: Cargo.toml: Removed 'zstd-pure = ["zstd"]' feature entry. src/compression/zstd_pure.rs → src/compression/zstd_backend.rs: Renamed the file. Inside, ZstdPureProvider → ZstdProvider. No behaviour change. src/compression/mod.rs: mod zstd_pure; → mod zstd_backend;. ZstdBackend type alias points at the renamed provider. README.md: Feature-flag table no longer mentions zstd-pure. zstd description no longer hints at a 'pure Rust' alternative existing — it IS the zstd backend. # Other review-cycle fixes carried in this commit PR #269 round-N inline threads: - Hot-path probe alloc-free entry point: new burr::contains_hash_from_bytes(bytes, hash) single-pass parse + probe. FilterBlock::maybe_contains_hash uses it instead of BurrFilterReader::new(...).contains_hash(...), avoiding the Vec<LayerView> heap allocation per probe on Table::check_bloom. - src/table/filter/ribbon/params.rs preamble comment updated: ribbon-serde feature wires only 'dep:serde' now (bitvec dep was dropped for cross-arch portability in an earlier commit; the comment still referenced 'bitvec/serde'). - src/table/filter/ribbon/_vendored/LICENSE-MIT: copyright line corrected from '(c) 2026 ribbon-filter contributors' to '(c) William Rågstad and ribbon-filter contributors' to match the attribution in ribbon/mod.rs. # Package + docs.rs polish README rewritten as a flat factual description of the engine: features grouped by read / write / compaction / storage / concurrency, limit numbers, feature-flag table. Dropped the upstream-fjall logo + logo-image references. Crate-level doc in lib.rs aligned with the README (sans the donation block). Cargo.toml: description rewritten in factual terms (was K.I.S.S./fork branding, is now the actual feature surface). documentation = 'https://docs.rs/coordinode-lsm-tree' explicit. keywords + categories tightened ('embedded', 'filesystem', 'compression' replace 'coordinode'). [package.metadata.docs.rs] block added: all-features build, cfg(docsrs) rustdoc-args so feature/availability badges render, x86_64-unknown-linux-gnu target. # Coverage uplift Vendored ribbon error Display impls (ParamError variants, ConstructionFailure InconsistentEquation + OutOfBounds, BuildError variants, FilterReprError variants under ribbon-serde feature) get explicit display-format tests covering all previously-uncovered arms. ~50 new uncovered lines in src/table/filter/ribbon/error.rs now exercised. # Verification cargo fmt: clean cargo clippy --all-features --all-targets --lib --tests -- -D warnings: clean cargo nextest run: 1246 pass (was 1241, +5)
…city - burr::wire: layer-header bounds checks now compute `header_end` via `pos.checked_add(LAYER_HEADER_LEN)` and bail to InvalidHeader on overflow, matching the existing checked-arithmetic pattern used for `thresholds_end` and `z_end`. Applied at both occurrences (`decode` and `contains_hash_from_bytes`). On 32-bit targets a corrupted `pos` could otherwise wrap past `bytes.len()` and let the bounds guard succeed by wraparound, then panic at the slice indexing. - writer/filter/partitioned: replace `mem::take(&mut bloom_hash_buffer)` with `mem::replace(..., Vec::with_capacity(old_cap))` so the buffer retains its grown capacity across partition spills. `take` leaves a capacity-0 Vec behind, forcing a reallocation on every register_key after each spill — measurable overhead on tables that spill many partitions per flush/compaction. No regression tests added: the wire-overflow path requires a 32-bit target to actually trigger (the unchecked add does not wrap on 64-bit hosts), and the partitioned-writer change is a perf improvement with no behavioural delta.
…ts mod The regression test added in the preceding commits uses `matches!` and explicit `Ok(_)` patterns rather than `.unwrap()`, so the `#[expect(clippy::unwrap_used)]` attribute on the `mod tests` block has nothing to fulfil. Under the workspace lint config (`-D warnings -D unfulfilled-lint-expectations` in the zstd-only CI matrix) the unfulfilled expectation becomes a hard compile error. Drop the attribute; future test additions can re-introduce it if they actually use `.unwrap()`.
Covers all `ParamError` / `ConstructionFailure` / `BuildError` / `FilterReprError` variants — including the new `ConstructionFailure::StorageLengthOverflow` introduced earlier on this branch, the back-substitution branch of `OutOfBounds` (no key_index), and the `BuildError::InvalidParams` arm which previously had no Display assertion. Bumps patch coverage for `src/table/filter/ribbon/error.rs` past the codecov gate (was 94.59%, all uncovered lines were Display arms).
- burr::BurrBuilder::new now rejects params with `b < w` up-front with
`InvalidParams("b must be >= w")`. `BurrParams::layer_m` floors its
return at `b` and rounds up to a multiple of `b`; the vendored
Ribbon `Params::new(m, w=64, ...)` separately requires `m >= w`. If
a caller hand-builds `BurrParams` with `b < w` (= 64), `layer_m`
could otherwise produce an `m` between `b` and `w-1` that Ribbon
rejects with a generic "vendored ribbon param error" hiding the
real invariant.
- lib.rs crate docs: the range-tombstone bullet referenced "V4 disk
format"; this branch makes V5 the current/only readable format and
rejects V3/V4 manifests at `Tree::open`. Update to "V5 format
introduced for the BuRR filter".
- README License section: clarify that only first-party `.rs` files
carry the SPDX-per-file header. The vendored Ribbon module keeps
its upstream layout (module-level licensing commentary + preserved
`_vendored/LICENSE-*` texts), so a blanket "every .rs file" claim
was inaccurate.
- error.rs tests: switch single-character `.contains("x")` to char
patterns to satisfy `clippy::single_char_pattern` under the
zstd-only CI matrix's `-D warnings` profile.
Reviewer correctly flagged that `BurrParams::layer_m` only guarantees its return is `>= b`, while the vendored Ribbon `Params::new` additionally requires `m >= w`. The runtime enforcement now lives in `BurrBuilder::new` (preceding commit, rejects `b < w`). Add a docstring paragraph here cross-referencing that gate so a future reader of `layer_m` understands why no per-call `m >= w` check is needed.
- Add full unit test module for `Params` (previously had none): covers every `ParamError` arm reachable from `Params::new`, the `with_seed` / `with_retry_limit` / `with_retry_policy` chain, the `r_from_fpr` / `from_expected_items` constructors including their error paths, and the `start_range` / `fingerprint_words` / `fingerprint_last_word_mask` accessors. - Add `BurrBuilder::new` validation tests for every reject branch: zero `n`, `r` out of `[1, 64]`, `w != 64`, zero `b`, the newly added `b < w` gate, and zero `max_layers`. Plus a Debug-formatter smoke-test for `BurrBuilder` and `BurrFilter` so the `impl Debug` arms aren't silently uncovered. Pure coverage / no-behavior commit; nudges codecov patch% closer to the project-coverage gate.
The doc-comment on `tree_rejects_pre_v5_manifest` documents that V5 must reject BOTH V3 and V4 manifests (the wire-format break introduced for the BuRR filter), but the test body only rewrote the on-disk manifest to V4 and asserted that single boundary. Wrap the test in a `for pre_v5 in [3, 4]` loop so V3 and V4 are both exercised end-to-end (open → flush → rewrite manifest → reopen → expect InvalidVersion(pre_v5)). A future relaxation of the boundary — accepting V4 but not V3, say — now lights up the test rather than quietly passing.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Around line 71-73: Update the README lines that document key/value size
limits: change "up to 65 536 bytes" to "up to 65,535 bytes (2^16 − 1)" and
change "up to 2³² bytes" to "up to 4,294,967,295 bytes (2^32 − 1)" so the
documented maxima match the u16/u32 encoded bounds used by the codebase; ensure
the wording around proportional performance cost remains unchanged.
In `@src/table/filter/ribbon/burr/params.rs`:
- Around line 147-151: The rounding expression raw.div_ceil(b) * b can overflow;
compute blocks = raw.div_ceil(b) first and then use a checked multiplication
(e.g., blocks.checked_mul(b)) to detect overflow instead of letting release-mode
wrap; handle the failure path (return an error, saturate to usize::MAX, or panic
with a clear message) so layer_m (the rounded value computed from raw, b,
layer_input_keys and self.b) never silently wraps.
In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 286-313: Add a new test (or extend the existing
burr_negative_keys_obey_fpr_envelope_at_low_target) to also verify the
very-tight FPR envelope at target 0.0001: create params with
BurrParams::with_fp_rate(n, 0.0001), build the filter via BurrBuilder::new(...)
and builder.build_from_hashes(...), probe the same probe_count range using
crate::hash::hash64 and filter.contains_hash, compute the realised fpr and
assert it stays within an appropriate envelope (similar style to the existing
0.001 case but with a stricter threshold for 0.0001).
- Around line 633-657: Update the test
contains_hash_from_bytes_round_trips_against_decoded to also iterate a disjoint
set of absent hashes and assert the single-pass contains_hash_from_bytes result
equals BurrFilterReader::contains_hash for each absent hash (mirroring the
present-hash loop), and similarly update the other test around lines 768-793 to
compare both present and absent corpora; reference the test function name
contains_hash_from_bytes_round_trips_against_decoded and the functions
contains_hash_from_bytes and BurrFilterReader::contains_hash so the absent-path
equivalence is verified rather than only sanity-checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0931a7af-3ae5-40a5-942b-6a1cc8a3bf5e
📒 Files selected for processing (153)
Cargo.tomlREADME.mdsrc/abstract_tree.rssrc/active_tombstone_set.rssrc/any_tree.rssrc/blob_tree/gc.rssrc/blob_tree/handle.rssrc/blob_tree/ingest.rssrc/blob_tree/mod.rssrc/cache.rssrc/checksum.rssrc/coding.rssrc/compaction/drop_range.rssrc/compaction/fifo.rssrc/compaction/flavour.rssrc/compaction/leveled/mod.rssrc/compaction/maintenance.rssrc/compaction/major.rssrc/compaction/mod.rssrc/compaction/movedown.rssrc/compaction/pulldown.rssrc/compaction/state/hidden_set.rssrc/compaction/state/mod.rssrc/compaction/stream.rssrc/compaction/tiered/mod.rssrc/compaction/worker.rssrc/comparator.rssrc/compression/mod.rssrc/compression/zstd_backend.rssrc/config/block_size.rssrc/config/compression.rssrc/config/filter.rssrc/config/hash_ratio.rssrc/config/mod.rssrc/config/pinning.rssrc/config/restart_interval.rssrc/descriptor_table.rssrc/double_ended_peekable.rssrc/encryption.rssrc/error.rssrc/file.rssrc/file_accessor.rssrc/format_version.rssrc/fs/io_uring_fs.rssrc/fs/mem_fs.rssrc/fs/mod.rssrc/fs/std_fs.rssrc/heap.rssrc/ingestion.rssrc/key.rssrc/key_range.rssrc/lib.rssrc/manifest.rssrc/memtable/arena.rssrc/memtable/interval_tree.rssrc/memtable/mod.rssrc/memtable/skiplist.rssrc/memtable/value_store.rssrc/merge.rssrc/merge_operator.rssrc/metrics.rssrc/mvcc_stream.rssrc/path.rssrc/pinnable_slice.rssrc/prefix.rssrc/range.rssrc/range_tombstone.rssrc/range_tombstone_filter.rssrc/run_reader.rssrc/run_scanner.rssrc/seqno.rssrc/slice/mod.rssrc/slice/slice_bytes/mod.rssrc/slice/slice_default/mod.rssrc/slice_windows.rssrc/stop_signal.rssrc/table/block/binary_index/builder.rssrc/table/block/binary_index/mod.rssrc/table/block/binary_index/reader.rssrc/table/block/decoder.rssrc/table/block/encoder.rssrc/table/block/hash_index/builder.rssrc/table/block/hash_index/mod.rssrc/table/block/hash_index/reader.rssrc/table/block/header.rssrc/table/block/mod.rssrc/table/block/offset.rssrc/table/block/trailer.rssrc/table/block/type.rssrc/table/block_index/full.rssrc/table/block_index/iter.rssrc/table/block_index/mod.rssrc/table/block_index/two_level.rssrc/table/block_index/volatile.rssrc/table/data_block/iter.rssrc/table/data_block/mod.rssrc/table/filter/block.rssrc/table/filter/mod.rssrc/table/filter/ribbon/builder.rssrc/table/filter/ribbon/burr/builder.rssrc/table/filter/ribbon/burr/filter.rssrc/table/filter/ribbon/burr/params.rssrc/table/filter/ribbon/burr/tests.rssrc/table/filter/ribbon/burr/wire.rssrc/table/filter/ribbon/error.rssrc/table/filter/ribbon/params.rssrc/table/id.rssrc/table/index_block/block_handle.rssrc/table/index_block/iter.rssrc/table/index_block/mod.rssrc/table/inner.rssrc/table/iter.rssrc/table/meta.rssrc/table/mod.rssrc/table/multi_writer.rssrc/table/regions.rssrc/table/scanner.rssrc/table/tests.rssrc/table/util.rssrc/table/writer/filter/full.rssrc/table/writer/filter/mod.rssrc/table/writer/filter/partitioned.rssrc/table/writer/index/full.rssrc/table/writer/index/mod.rssrc/table/writer/index/partitioned.rssrc/table/writer/meta.rssrc/table/writer/mod.rssrc/time.rssrc/tree/ingest.rssrc/tree/inner.rssrc/tree/mod.rssrc/tree/sealed.rssrc/util.rssrc/value.rssrc/value_type.rssrc/verify.rssrc/version/mod.rssrc/version/optimize.rssrc/version/recovery.rssrc/version/run.rssrc/version/super_version.rssrc/vlog/accessor.rssrc/vlog/blob_file/merge.rssrc/vlog/blob_file/meta.rssrc/vlog/blob_file/mod.rssrc/vlog/blob_file/multi_writer.rssrc/vlog/blob_file/reader.rssrc/vlog/blob_file/scanner.rssrc/vlog/blob_file/writer.rssrc/vlog/handle.rssrc/vlog/mod.rssrc/write_batch.rstests/tree_recovery_versions.rs
Exercise the four `RibbonBuilder` entry points BuRR uses but the existing tests only hit transitively through full BuRR builds: - `build_with_seed_verbatim` happy-path with generous m (single attempt, fixed seed lands), persistence of caller-supplied seed. - `build_with_seed_verbatim` failure-path with tight m → asserts `attempts == 1` (no retry budget on verbatim). - `build_with_seed_verbatim_from_hashes` round-trip with xxh3 hashes (the actual LSM-side path) + failure variant. - `build` in Homogeneous mode: must break out of the inner retry loop after a single attempt and must NOT grow `m` between outer iterations (Homogeneous has no retry/grow semantics — the algorithm assigns random fingerprints to unoccupied rows in a single shot). Pure coverage / no-behavior commit; targets uncovered branches in `src/table/filter/ribbon/builder.rs` flagged by the codecov patch report.
… tests
- README + crate-level docs claimed "up to 65 536 bytes" for keys and
"up to 2³² bytes" for values; both were off-by-one because the
on-disk encoding caps the length fields at u16 / u32. Documented
values are now 65,535 (`u16::MAX`) and 4,294,967,295 (`u32::MAX`,
`2³² − 1`) with the length-field type spelled out next to each.
- burr::BurrParams::layer_m: replace `raw.div_ceil(b) * b` (which can
wrap in release builds on large inputs and silently yield a SMALLER
m than the caller asked for) with checked_mul + saturation to the
largest usize-fitting multiple of b. The downstream
`RibbonBuilder::build_once` then surfaces the oversize cleanly via
`ConstructionFailure::StorageLengthOverflow` instead of allocating
an undersized filter.
- Tests:
* Add `burr_negative_keys_obey_fpr_envelope_at_very_low_target` —
the tightest FPR target (0.0001) called out in the acceptance
criteria was previously unverified; a regression there would have
merged unnoticed. Realised FPR over 50k disjoint probes must stay
below 0.1% (10× envelope around the 0.01% target).
* Extend `contains_hash_from_bytes_round_trips_against_decoded` and
`contains_hash_from_bytes_returns_false_for_non_inserted` with a
disjoint absent-hash corpus, asserting exact equality between the
single-pass `contains_hash_from_bytes` entry point and the
decoded-reader `BurrFilterReader::contains_hash` on both present
AND absent inputs. The two are separate implementations; a
mismatch on the not-found path would otherwise pass silently.
The crate-level docs implied range tombstones were introduced together with the V5 disk format. They were not: the SST-side range-tombstone encoding landed in V4. V5's breaking change is the filter wire format (BuRR replaces Bloom), nothing to do with tombstones. Reword the bullet so the on-disk history is accurate and the V4 → V5 distinction stays clear.
BREAKING CHANGE: on-disk format bumped to V5. V3/V4 databases are NOT readable by this version and vice versa. V5 introduces the BuRR filter wire-format break (replaces standard bloom) shipped in #269 and the PITR hard-link checkpoint primitive shipped in #210. Both touch the manifest layout / filter block encoding and are not backward-compatible at the on-disk level. Squash-merge marker for release-plz: this PR is the major bump for the V5 cycle. The Cargo.toml version stays at 4.5.0; release-plz picks up the BREAKING CHANGE footer on squash and opens the v5.0.0 release PR. Drive-by no-std and CI fixes bundled here: - replace interval-heap (transitive dep `compare v0.0.6` doesn't declare `#![no_std]`, blocking the no-std-check CI job) with `alloc::collections::BinaryHeap<Reverse<_>>`. The blob-file merger only needs pop-min / push, no double-ended interval semantics — switching to BinaryHeap removes a 10-year-old transitive dep. - src/vlog/blob_file/merge.rs uses `core::*` / `alloc::*` for the types previously imported through `std::*` re-exports (Ordering, BinaryHeap, Vec). Pure label change for std targets, unblocks no-std for this module. - tests/checkpoint_pitr.rs concurrent_writes watermark assertion uses `i < info.seqno` (the actual visible_seqno contract: lowest excluded seqno) rather than `<=`. The `<=` form raced on cross-target runners (aarch64/riscv64/musl/etc) because writer's `insert(seqno=i)` and `fetch_max(i + 1)` aren't atomic — main could observe visible_seqno = i between those steps, then assert k(i) present even though the watermark contract excludes it. Strict `<` matches the semantic and stops the cross-target flakes. - .github/instructions/rust.instructions.md gains a no-std-direction section with 20 rules split into suggest / reject / always-applies buckets. Per-crate tier tables are NOT required; reviewers should guide toward core/alloc primitives, not gatekeep. Closes #210
) ## Summary Implements [#210](#210) — the consistent hard-link checkpoint primitive that unblocks point-in-time recovery (PITR) backup for CoordiNode and any other consumer that needs an O(1)-per-file snapshot of an open LSM-tree. - New trait method `Fs::hard_link(src, dst)` with a default `Unsupported` impl (non-breaking for downstream backends). `StdFs` performs a real `hard_link(2)` with an EXDEV→copy fallback; `MemFs` produces an independent copy; `IoUringFs` delegates to `StdFs` since linking is a metadata-only syscall. - New trait method `Fs::backend_id() -> Option<u64>` — explicit shared-namespace capability check that gates cross-`Fs` hard-link attempts. `StdFs`/`IoUringFs` return a common `KERNEL_BACKEND_ID`; each `MemFs::new()` allocates a unique per-instance id; default impl returns `None` for safe-by-default third-party backends. - `DeletionPause` — a tree-wide refcount-gated deferred-delete queue installed into each `Table::Inner` / `BlobFile::Inner`. While at least one `Pause` guard is alive, both `Drop` impls queue removals instead of executing them; the queue drains when the last guard is dropped. Mirrors RocksDB's `DisableFileDeletions`/`EnableFileDeletions` pattern. Synchronisation uses `spin::Mutex` so the type is `no_std`-compatible. - `AbstractTree::create_checkpoint(target) -> CheckpointInfo` — implemented on `Tree`, `BlobTree`, and (via `enum_dispatch`) `AnyTree`. The driver captures `visible_seqno` first, flushes the active memtable (eviction seqno `0` — never expands MVCC GC beyond what would have happened anyway), snapshots the current version, hard-links every live SST + blob into `target/{tables,blobs}/`, copies manifest / `v<id>` / `current`, and fsyncs the target root. - `level_routes` (tiered storage) is honoured automatically — each `Table::Inner` already carries its own routed `Fs`, so the link is performed through the right backend. Cross-`Fs` situations (e.g. `MemFs` source vs. `StdFs` target) transparently fall back to a streamed byte copy after the `backend_id` namespace check declines the hard-link path. - `Tree::open` now rejects a directory that has version artifacts (`tables/`, `blobs/`, `vN`) but no `current` pointer — the on-disk signature of a half-written checkpoint. Previously the code silently called `create_new`, which would have overwritten the partial state with an empty tree. ## Test plan `cargo nextest run --all-features` — **1327/1327 passed** (6 environment-skipped), including 14 integration scenarios in `tests/checkpoint_pitr.rs`: - [x] Round-trip — checkpoint + reopen as standalone tree reads every key back - [x] Concurrent writes during checkpoint do not corrupt the snapshot - [x] Deferred-delete invariant: concurrent `major_compact` cannot remove an SST captured by an in-flight checkpoint - [x] BlobTree checkpoint captures both index SSTs and blob files - [x] `level_routes` SSTs land in the flattened `target/tables/` directory - [x] Source vs. checkpoint isolation — writes on either side do not bleed across - [x] Crash-safety — failed checkpoint leaves the source tree fully reopenable (chmod-driven `link_tables` failure) - [x] Empty tree checkpoint succeeds with `sst_files == 0` - [x] `CheckpointInfo.total_bytes` matches the on-disk sum - [x] Re-running a checkpoint into the same target is rejected with `AlreadyExists` - [x] MVCC regression — checkpoint-triggered flush must NOT drop history needed by source-tree snapshot readers (eviction seqno = 0, not `SeqNo::MAX`) - [x] Half-written-checkpoint detection — missing `current` pointer rejected by `Tree::open` - [x] Half-written-checkpoint detection — corrupt `current` pointer rejected by `Tree::open` - [x] Cross-`Fs` (`StdFs`↔`MemFs`) link-or-copy streams through both trait objects (inline in `src/checkpoint.rs`) Plus unit coverage for: - `DeletionPause` — defer / inactive-passthrough / nested-pauses / generation-race-under-lock - `StdFs::is_cross_device` — raw EXDEV + `ErrorKind` variants - `Fs::hard_link` — round-trip + duplicate / missing-source rejection (both `StdFs` and `MemFs`) - `copy_fallback` — independence + create-new semantics - [x] `cargo clippy --lib --tests` clean - [x] `cargo test --doc` passes for all touched modules Closes #210 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Point-in-time checkpoint snapshots (optional blob inclusion) reporting file counts, total bytes, version id and visibility seqno; checkpoints can be opened as independent trees. * **Reliability / Bug Fixes** * Atomic, durable checkpoint creation with hard-link-first and streamed-copy fallback (debug-logged); best-effort cleanup on failure; opening rejects half-written or corrupt checkpoints with clearer diagnostics. * **Utilities** * Checkpoint-aware gate to defer physical file removals during snapshot lifetimes. * **Filesystem compatibility** * Improved create-dir/hard-link handling with cross-filesystem copy fallback and namespace signaling. * **Tests / Documentation** * Extensive end-to-end, isolation and crash-safety checkpoint suite; README updated to document PITR checkpoints. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/structured-world/coordinode-lsm-tree/pull/276?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --- BREAKING CHANGE: on-disk format V5. V3/V4 databases are NOT readable by this version and vice versa. V5 ships the BuRR filter wire-format change (replaces standard bloom, #269) plus the PITR hard-link checkpoint primitive (this PR, #210). The major bump goes through release-plz on squash.
Summary
Replaces the standard bloom filter in the LSM filter framework with BuRR (Bumped Ribbon Retrieval), the math-optimal approximate-membership filter (Walzer & Dillinger 2022, arXiv:2109.01892). BuRR storage is ≈ r bits per key + ~1% overhead vs the information-theoretic minimum — substantially tighter than the ~10 bpk Bloom needs for the same FPR, with comparable probe cost.
Closes #268.
Disk format
Manifest version bumped to V5. V3/V4 databases are rejected at `Tree::open` with `InvalidVersion`; tables written by this version are not readable by pre-V5 binaries. No migration shim — fresh tables only.
Breaking changes (semver major)
Both major-trigger commits in this PR carry the conventional `!` marker so release-plz will major-bump.
Implementation
Public API
`BloomConstructionPolicy` public config retained — same `BitsPerKey(f32)` / `FalsePositiveRate(f32)` variants, semantics now mapped to BuRR fingerprint-width `r`.
Pairing contract (probe APIs)
Mixing paths produces false negatives. Both API surfaces carry symmetric docstring warnings.
Test plan
Summary by CodeRabbit
New Features
Removed
Documentation
Tests
Chores