Skip to content

feat(filter): replace standard bloom with BuRR#269

Merged
polaz merged 50 commits into
mainfrom
feat/#268
May 19, 2026
Merged

feat(filter): replace standard bloom with BuRR#269
polaz merged 50 commits into
mainfrom
feat/#268

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 19, 2026

Summary

Replaces the standard bloom filter in the LSM filter framework with BuRR (Bumped Ribbon Retrieval), the math-optimal approximate-membership filter (Walzer & Dillinger 2022, arXiv:2109.01892). BuRR storage is ≈ r bits per key + ~1% overhead vs the information-theoretic minimum — substantially tighter than the ~10 bpk Bloom needs for the same FPR, with comparable probe cost.

Closes #268.

Disk format

Manifest version bumped to V5. V3/V4 databases are rejected at `Tree::open` with `InvalidVersion`; tables written by this version are not readable by pre-V5 binaries. No migration shim — fresh tables only.

Breaking changes (semver major)

  • BuRR filter wire format replaces the legacy bloom filter blocks. Filter block reader rejects pre-V5 magic.
  • `FormatVersion::V5` is the only accepted manifest version.
  • `zstd-pure` feature flag removed (it was a deprecated alias for `zstd`; only one zstd backend remains).
  • `Cargo.toml` description, keywords, categories rewritten; no longer claims to be a fork derivative.

Both major-trigger commits in this PR carry the conventional `!` marker so release-plz will major-bump.

Implementation

  • Vendored ribbon-filter v0.2.0 under `src/table/filter/ribbon/` as the GF(2) banded-solver primitive (MIT/Apache-2.0 → in-tree copy under Apache-2.0). Plan to extract later as a standalone crate alongside BuRR.
  • New `src/table/filter/ribbon/burr/` module: per-block threshold scheme (~90% load), multi-layer construction over bumped keys, hash-based build + probe API, byteorder-stable wire format encoder/decoder.
  • Single-pass parse + probe entry point (`contains_hash_from_bytes`) used by `FilterBlock::maybe_contains_hash` — no per-call `Vec` allocation on the table-read hot path.
  • Deleted `src/table/filter/standard_bloom/` and `src/table/filter/bit_array/`.
  • Rewired filter writers (`full.rs`, `partitioned.rs`) and the filter block reader.
  • Replaced ~30 `standard_bloom::Builder::get_hash(key)` callsites with `crate::hash::hash64(key)`. Made `hash` module `pub` for stable hash entry point.
  • Cross-arch portability: dropped `bitvec` dep (`u64: BitStore` hard-gated on `target_has_atomic = "64"` — fails on i686/riscv32/etc.). The Ribbon body now stores a plain `Vec` directly.
  • Endian portability: default `Params::retry_limit` bumped from 1 to 8 so the construction succeeds across architectures regardless of `DefaultHasher::write_u64`'s native-endian hashing.

Public API

  • `BurrBuilder::new(BurrParams, BuildHasher)`
  • `BurrBuilder::build(&[K])` / `build_from_hashes(&[u64])` / `build_from_hashes_owned(Vec)`
  • `BurrFilter::contains(&Q)` / `contains_in(&Q, &mut Scratch)` / `contains_hash(u64)` / `to_wire_bytes()`
  • `BurrFilterReader::new(&[u8])` / `contains_hash(u64)`
  • Standalone `contains_hash_from_bytes(&[u8], u64)` for single-pass parse + probe
  • `BurrParams::with_fp_rate(n, fpr)` / `with_bpk(n, bpk)` / `with_seed(u64)`

`BloomConstructionPolicy` public config retained — same `BitsPerKey(f32)` / `FalsePositiveRate(f32)` variants, semantics now mapped to BuRR fingerprint-width `r`.

Pairing contract (probe APIs)

  • Filter built via `BurrBuilder::build(keys)` → probe with `contains`/`contains_in` (key path).
  • Filter built via `BurrBuilder::build_from_hashes` / `_owned` → probe with `contains_hash` / `contains_hash_from_bytes` (hash path).

Mixing paths produces false negatives. Both API surfaces carry symmetric docstring warnings.

Test plan

  • nextest workspace: 1254 tests pass (was 1193 before BuRR)
  • 44 BuRR unit tests (round-trip, FPR envelope, corruption rejection per-field with specific error variant)
  • end-to-end integration test through table writer + reader
  • doctests: 42 pass
  • full-feature lint pass clean
  • ribbon-serde feature build clean
  • db_bench: BuRR >= bloom on 8 of 9 workloads, parity on mergerandom (read-heavy +5 to +228%, write +14 to +123%)

Summary by CodeRabbit

  • New Features

    • BuRR (layered ribbon) AMQ filters with vendored ribbon filter and optional serde support; zstd provider selection updated.
  • Removed

    • Legacy standard Bloom filter implementation removed.
  • Documentation

    • Crate docs and README updated; on-disk format advanced to V5.
  • Tests

    • Extensive unit, benchmark, and integration suites covering BuRR/ribbon and wire-format round-trips.
  • Chores

    • Manifest/version tightened to V5, unified hashing usage, and standardized SPDX license headers.

Review Change Stack

Copilot AI review requested due to automatic review settings May 19, 2026 09:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9c69da84-feee-4e0a-959a-7ff539e13f71

📥 Commits

Reviewing files that changed from the base of the PR and between 0bdcecf and 10c5c23.

📒 Files selected for processing (1)
  • src/lib.rs

📝 Walkthrough

Walkthrough

Replaces bloom with vendored ribbon-filter and BuRR multilayer filters; migrates filter hashing to crate::hash::hash64; exposes hash module; adds BuRR wire format, builders, readers, and probes; updates manifest V5, benches, tests, Cargo metadata, zstd wiring, and many license headers.

Changes

BuRR Filter Integration

Layer / File(s) Summary
Ribbon vendoring and core primitives
src/table/filter/ribbon/*
Adds vendored ribbon primitives: params, hashing, builder, filter, errors, serde repr (feature-gated), and extensive ribbon tests.
BuRR layered filter (burr) modules
src/table/filter/ribbon/burr/*
Adds BuRR params, thresholding, builder, layer/filter types, wire encode/decode, single-pass probing, error types, and comprehensive unit tests including decode hardening.
Filter surface integration
src/table/filter/mod.rs, src/table/filter/block.rs
Switches public filter surface to ribbon/BuRR, adds build_burr_filter_bytes, changes FilterBlock probe to BuRR single-pass probe, and updates policy sizing to BuRR params.
Table writer/read-path migrations
src/table/writer/filter/*, src/table/*, src/tree/*, src/blob_tree/*
Wires BuRR bytes into full and partitioned filter writers, moves pre-hash call sites from Builder::get_hash to crate::hash::hash64, updates multi_get/point-read code and manifest V5 checks.
Ribbon & BuRR core algorithms
src/table/filter/ribbon/{builder,filter,hashing,params}, .../burr/{builder,filter,params,threshold,wire,error}
Implements equation generation, SplitMix64 PRNG, forward/back-substitution solver, threshold computation/partitioning, layered build logic, wire-format layout, zero-copy decode, and probe semantics.
Tests and benches
tests/*, src/table/filter/ribbon/tests.rs, src/table/filter/ribbon/burr/tests.rs, benches/*
Adds unit and integration tests (FN-free, FPR envelopes, decode hardening), serde round-trips (feature-gated), and new/updated benches for BuRR and ribbon baselines.
Metadata, docs, and packaging
Cargo.toml, README.md, src/lib.rs
Adds ribbon-serde feature, optional serde dependency, serde_json dev-dep, documents V5 on-disk break, updates README, and makes hash module public.
Compression backend rename
src/compression/mod.rs, src/compression/zstd_backend.rs
Rewires zstd feature to structured-zstd, renames provider struct to ZstdProvider, and updates related tests.
Header/license and removals
many src/* files
Adds SPDX headers/updates copyright years, removes standard_bloom and bit_array modules and re-exports, and annotates public hash functions with #[must_use].

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#268

Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 10c5c23 Previous: 6f0d811 Ratio
fillseq 845704.925800426 ops/sec (normalized) 1212148.0486146263 ops/sec (normalized) 1.43
fillrandom 409621.2250521961 ops/sec (normalized) 656262.2987714246 ops/sec (normalized) 1.60
readrandom 210988.42737450165 ops/sec (normalized) 287384.2141945515 ops/sec (normalized) 1.36
seekrandom 162348.55941495934 ops/sec (normalized) 210244.43188624468 ops/sec (normalized) 1.30
prefixscan 99822.94229941237 ops/sec (normalized) 118177.26381944011 ops/sec (normalized) 1.18
overwrite 413216.61157293693 ops/sec (normalized) 683457.353426151 ops/sec (normalized) 1.65
mergerandom 343645.52916514216 ops/sec (normalized) 490769.1783137203 ops/sec (normalized) 1.43
readwhilewriting 183875.06406959324 ops/sec (normalized) 264726.89902541734 ops/sec (normalized) 1.44

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the LSM-table approximate-membership filter implementation from the legacy “standard bloom” filter to a new BuRR (Bumped Ribbon Retrieval) filter, backed by a vendored Ribbon (GF(2) banded solver) primitive. It also standardizes call sites on a stable crate::hash::hash64 entry point and rewires filter writers/readers and benchmarks accordingly.

Changes:

  • Replace standard bloom + bit-array filter modules with a new ribbon module and BuRR wire format + builder/probe APIs.
  • Rewire table filter writers/readers and tree/table call sites to use crate::hash::hash64 + BuRR contains_hash paths.
  • Vendor Ribbon implementation + add BuRR tests/benches and make hash module public.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/tree/mod.rs Switch key hashing from bloom builder hash to crate::hash::hash64.
src/blob_tree/mod.rs Switch key hashing from bloom builder hash to crate::hash::hash64.
src/table/writer/filter/partitioned.rs Build partitioned filters via build_burr_filter_bytes; switch hashing to hash64.
src/table/writer/filter/full.rs Build full filter via build_burr_filter_bytes; switch hashing to hash64.
src/table/tests.rs Update tests to use hash64 instead of bloom builder hashing.
src/table/mod.rs Update debug assertions to validate key_hash == hash64(key).
src/table/filter/mod.rs Replace bloom builder plumbing with BuRR params + build_burr_filter_bytes; update size estimates and tests.
src/table/filter/block.rs Switch filter block probing to BurrFilterReader.
src/table/filter/standard_bloom/mod.rs Removed legacy standard bloom reader implementation.
src/table/filter/standard_bloom/builder.rs Removed legacy standard bloom builder implementation.
src/table/filter/bit_array/mod.rs Removed legacy bit array module.
src/table/filter/bit_array/builder.rs Removed legacy bit array builder.
src/table/filter/bit_array/reader.rs Removed legacy bit array reader.
src/table/filter/ribbon/mod.rs Add vendored Ribbon module entry + exports + (currently) broad lint suppressions.
src/table/filter/ribbon/params.rs Add Ribbon parameter validation/types.
src/table/filter/ribbon/error.rs Add Ribbon build/param/serde repr error types.
src/table/filter/ribbon/hashing.rs Add hashing/equation generation utilities (incl. splitmix64).
src/table/filter/ribbon/builder.rs Add Ribbon builder (including “verbatim seed” build variants for BuRR).
src/table/filter/ribbon/filter.rs Add Ribbon filter type + probe path + (optional) serde repr.
src/table/filter/ribbon/tests.rs Add Ribbon unit/property/statistical tests.
src/table/filter/ribbon/burr/mod.rs Add BuRR module entry + public exports.
src/table/filter/ribbon/burr/params.rs Add BuRR parameter derivation from FPR/BPK.
src/table/filter/ribbon/burr/error.rs Add BuRR build/construction error types.
src/table/filter/ribbon/burr/threshold.rs Add per-block threshold computation + partitioning helpers.
src/table/filter/ribbon/burr/builder.rs Add multi-layer BuRR builder (including hash-based build entrypoint).
src/table/filter/ribbon/burr/filter.rs Add BuRR filter + wire-format reader wrapper.
src/table/filter/ribbon/burr/wire.rs Add BuRR wire-format encode/decode + hash-based probe against decoded bytes.
src/table/filter/ribbon/burr/tests.rs Add BuRR smoke/correctness + basic wire tests.
src/table/filter/ribbon/_vendored/LICENSE-MIT Add vendored Ribbon license text.
src/table/filter/ribbon/_vendored/LICENSE-APACHE Add vendored Ribbon license text.
src/prefix.rs Switch prefix hashing to crate::hash::hash64.
src/lib.rs Make hash module public.
src/hash.rs Add docs + #[must_use] to stable hash64/hash128.
Cargo.toml Add bitvec dependency + ribbon-serde feature.
benches/index_block.rs Switch key hashing to hash64.
benches/bloom.rs Rewrite benchmark to BuRR construction/probe (file name kept).

Comment thread src/table/filter/ribbon/burr/wire.rs
Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Comment thread src/table/filter/ribbon/burr/filter.rs
Comment thread src/table/writer/filter/full.rs Outdated
Comment thread src/table/filter/ribbon/burr/tests.rs Outdated
Comment thread src/table/filter/ribbon/mod.rs
Comment thread src/table/filter/mod.rs
Comment thread Cargo.toml Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/table/writer/filter/full.rs (1)

94-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move start("filter") call after the empty-filter check to avoid leaving the writer section unfinalized.

In full.rs, file_writer.start("filter") is called before building filter bytes. If filter_bytes.is_empty(), the method returns early without writing any data to the section, leaving the writer in an inconsistent state. This violates the sfa::Writer API contract: start() must always be followed by a write operation.

The safe fix is to defer start() until after confirming that filter data will be written:

Corrected ordering
-            file_writer.start("filter")?;
-
             let n = self.bloom_hash_buffer.len();
 
             log::trace!(
@@ -105,6 +104,7 @@ impl<W: std::io::Write + std::io::Seek> FilterWriter<W> for FullFilterWriter {
                 log::trace!("BuRR policy produced empty filter — skipping block write");
                 return Ok(0);
             }
+            file_writer.start("filter")?;
 
             log::trace!(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/table/writer/filter/full.rs` around lines 94 - 110, The call to
file_writer.start("filter") is invoked before building the filter and may be
left open when build_burr_filter_bytes returns an empty buffer; move the
file_writer.start("filter") invocation to after calling build_burr_filter_bytes
and after confirming !filter_bytes.is_empty() so start() is only called when you
will write the section. Specifically, keep the call to
build_burr_filter_bytes(self.bloom_policy, &self.bloom_hash_buffer) and the
empty-check (filter_bytes.is_empty()), and only call file_writer.start("filter")
immediately before writing the filter bytes (and then follow with the
write/finalize calls that the sfa::Writer API expects).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benches/bloom.rs`:
- Line 90: The benchmark is reconstructing BurrFilterReader each iteration
(BurrFilterReader::new(&filter_bytes)), so contains_hash measurements include
decode overhead; move the reader construction out of b.iter to create a
long‑lived reader instance (e.g., create reader once from filter_bytes before
calling b.iter) and then call reader.contains_hash(...) inside b.iter to measure
steady‑state probe latency; if you also want to measure decode+probe, add a
separate benchmark that constructs BurrFilterReader::new(&filter_bytes) inside
its own b.iter; update the contains_hash benchmark names/comments to reflect
probe‑only vs decode+probe and ensure you use Criterion’s latency reporting
(p99/p999) settings per benches/** guidelines.

In `@src/table/filter/mod.rs`:
- Around line 63-68: Replace the #[allow(...)] clippy suppressions with
#[expect(..., reason = "...")] so unnecessary suppressions are surfaced: change
the attribute that currently reads #[allow(clippy::cast_precision_loss,
clippy::cast_possible_truncation, clippy::cast_sign_loss, reason = "estimation,
precision loss is acceptable")] to #[expect(clippy::cast_precision_loss,
clippy::cast_possible_truncation, clippy::cast_sign_loss, reason = "estimation,
precision loss is acceptable")] and make the same replacement for the second
suppression site noted in the diff (the other #[allow(...)] occurrence around
the estimation/cast logic).

In `@src/table/filter/ribbon/burr/filter.rs`:
- Around line 127-176: contains_hash (and contains_in) allocate fresh Vec<u64>
for fingerprint and acc on every probe which breaks the allocation-free
contract; change these to use stack-sized arrays when params.r <= 64 or reuse
buffers from the existing Scratch allocator created by new_scratch.
Specifically, eliminate the local Vec<u64> allocations for fingerprint and acc
in contains_hash (and the same pattern in contains_in), compute stride from
self.params.r as before, and obtain mutable buffers from Scratch (or use a fixed
[u64;1] when stride==1) instead of calling vec![...]; keep usage with
super::super::hashing::xor_words and the equation/fingerprint flow unchanged but
operate on the borrowed/reused buffers so no heap allocation occurs per probe.

In `@src/table/filter/ribbon/burr/params.rs`:
- Around line 120-131: The function layer_m can divide by zero because the
public field b may be 0; add an explicit guard at the start of the division
section in fn layer_m to validate b (usize::from(self.b)) is > 0 and fail fast
with a clear message (e.g., panic!("invalid params: b must be > 0")) or convert
the API to return a Result instead; update the block that computes b and uses
raw.div_ceil(b) to first check if b == 0 and handle accordingly so no
divide-by-zero panic occurs (refer to layer_m and the public field b for the
exact locations to change).
- Around line 70-71: Replace the three #[allow(clippy::cast_possible_truncation,
clippy::cast_sign_loss)] attributes with
#[expect(clippy::cast_possible_truncation, reason = "...")] and
#[expect(clippy::cast_sign_loss, reason = "...")] on the specific casts: for the
r conversion (r_f -> r as u8) add an expect explaining r_f is validated to [1.0,
64.0] before cast; for the bpk conversion (bpk -> bpk as u8) add an expect
explaining bpk is validated/clamped to [1.0, 64.0] before cast; and for the
layer_input_keys -> f64 conversion add an expect indicating values were
validated earlier and the resulting f64 is bounded by self.b, so the cast is
safe; keep the existing validation/clamping logic and only replace allow with
expect and concise reasons referencing r_f, bpk, and layer_input_keys/self.b.

In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 173-250: The decode function is trusting layer metadata (r, w, b,
num_blocks, z_byte_len) and only uses debug_assert! for the z payload size;
replace that with explicit runtime validation in decode: validate r/w/b are
within acceptable non-zero ranges (e.g., b != 0), ensure num_blocks and
z_byte_len are sane (num_blocks <= m and z_byte_len == m * stride_words * 8),
and verify the remaining bytes length before slicing; if any check fails return
Err(crate::Error::InvalidHeader("BurrFilter layer ...")); update the checks
around pos/HEADER_LEN/LAYER_HEADER_LEN and the place building LayerView so
corrupt metadata cannot produce downstream panics in Params::new, is_bumped,
read_row or silent skips in contains_hash.

In `@src/table/filter/ribbon/params.rs`:
- Line 5: The build fails when --all-features because the ribbon-serde feature
enables #[cfg_attr(feature = "ribbon-serde", derive(serde::Serialize,
serde::Deserialize))] (seen in params.rs and filter.rs) but serde isn't declared
or wired to that feature; add serde as an optional dependency in Cargo.toml
(e.g. serde = { version = "1.0", features = ["derive"], optional = true }) and
update the [features] section to include "ribbon-serde" =
["your-crate-name/serde"] (or simply ["serde"] if in the same package) so the
serde::Serialize/Deserialize derives compile when ribbon-serde is enabled.

In `@src/table/mod.rs`:
- Line 1092: Update the stale debug assertion message used in
bloom_may_contain_key to reference the actual hashing function
crate::hash::hash64 rather than BloomBuilder::get_hash(key); locate the
assertion string in src/table/mod.rs and change the message to something like
"bloom_may_contain_key: key_hash must be crate::hash::hash64(key)" (or
equivalent wording) so the log correctly reflects the function used to compute
key_hash.

---

Outside diff comments:
In `@src/table/writer/filter/full.rs`:
- Around line 94-110: The call to file_writer.start("filter") is invoked before
building the filter and may be left open when build_burr_filter_bytes returns an
empty buffer; move the file_writer.start("filter") invocation to after calling
build_burr_filter_bytes and after confirming !filter_bytes.is_empty() so start()
is only called when you will write the section. Specifically, keep the call to
build_burr_filter_bytes(self.bloom_policy, &self.bloom_hash_buffer) and the
empty-check (filter_bytes.is_empty()), and only call file_writer.start("filter")
immediately before writing the filter bytes (and then follow with the
write/finalize calls that the sfa::Writer API expects).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37f91efc-58c4-49ab-abbe-9ea23faecc9c

📥 Commits

Reviewing files that changed from the base of the PR and between 6f0d811 and 2778d71.

📒 Files selected for processing (36)
  • Cargo.toml
  • benches/bloom.rs
  • benches/index_block.rs
  • src/blob_tree/mod.rs
  • src/hash.rs
  • src/lib.rs
  • src/prefix.rs
  • src/table/filter/bit_array/builder.rs
  • src/table/filter/bit_array/mod.rs
  • src/table/filter/bit_array/reader.rs
  • src/table/filter/block.rs
  • src/table/filter/mod.rs
  • src/table/filter/ribbon/_vendored/LICENSE-APACHE
  • src/table/filter/ribbon/_vendored/LICENSE-MIT
  • src/table/filter/ribbon/builder.rs
  • src/table/filter/ribbon/burr/builder.rs
  • src/table/filter/ribbon/burr/error.rs
  • src/table/filter/ribbon/burr/filter.rs
  • src/table/filter/ribbon/burr/mod.rs
  • src/table/filter/ribbon/burr/params.rs
  • src/table/filter/ribbon/burr/tests.rs
  • src/table/filter/ribbon/burr/threshold.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/error.rs
  • src/table/filter/ribbon/filter.rs
  • src/table/filter/ribbon/hashing.rs
  • src/table/filter/ribbon/mod.rs
  • src/table/filter/ribbon/params.rs
  • src/table/filter/ribbon/tests.rs
  • src/table/filter/standard_bloom/builder.rs
  • src/table/filter/standard_bloom/mod.rs
  • src/table/mod.rs
  • src/table/tests.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs
  • src/tree/mod.rs
💤 Files with no reviewable changes (5)
  • src/table/filter/bit_array/builder.rs
  • src/table/filter/bit_array/mod.rs
  • src/table/filter/bit_array/reader.rs
  • src/table/filter/standard_bloom/mod.rs
  • src/table/filter/standard_bloom/builder.rs

Comment thread benches/bloom.rs Outdated
Comment thread src/table/filter/mod.rs Outdated
Comment thread src/table/filter/ribbon/burr/filter.rs
Comment thread src/table/filter/ribbon/burr/params.rs Outdated
Comment thread src/table/filter/ribbon/burr/params.rs Outdated
Comment thread src/table/filter/ribbon/burr/wire.rs
Comment thread src/table/filter/ribbon/burr/wire.rs
Comment thread src/table/filter/ribbon/params.rs
Comment thread src/table/mod.rs Outdated
Copilot AI review requested due to automatic review settings May 19, 2026 10:12
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/table/filter/mod.rs`:
- Around line 39-47: is_active() currently returns true for any 0<fpr<1 for the
FalsePositiveRate variant, which can disagree with BurrParams::with_fp_rate;
update the FalsePositiveRate branch in is_active() to mirror
BurrParams::with_fp_rate by computing r = ceil(-log2(fpr)) (or equivalently
checking fpr >= 2^-64) and only return true when r is in 1..=64 (i.e., fpr > 0.0
&& fpr < 1.0 && ceil(-log2(*fpr)) between 1 and 64) so that is_active() and
BurrParams::with_fp_rate agree (reference: is_active, FalsePositiveRate,
BurrParams::with_fp_rate).

In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 54-55: Replace the non-self-checking lint suppression on the float
cast with an expect annotation: for the site that computes fpr (the let fpr =
false_positives as f64 / probe_count as f64;), change
#[allow(clippy::cast_precision_loss)] to #[expect(clippy::cast_precision_loss,
reason = "casting counts to f64 for rate calculation in tests")]; do the same
for the other two occurrences flagged (the other casts around the tests that
compute rates/statistics) so the suppression is self-checking when the casts
become unnecessary.

In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 85-96: Replace the newly added clippy suppressions in
src/table/filter/ribbon/burr/wire.rs with expect attributes: change the
#[allow(clippy::expect_used, reason = "...")] above the write_u8 sequence to
#[expect(clippy::expect_used, reason = "...")], and likewise replace the
#[allow(clippy::cast_possible_truncation, reason = "...")] that precedes let
num_layers_u8 = layers.len() as u8 with
#[expect(clippy::cast_possible_truncation, reason = "...")]; apply the same
pattern to the other two occurrences referenced (around the other blocks at the
later locations) so all three allow -> expect replacements keep the same reason
text.

In `@src/table/filter/ribbon/params.rs`:
- Around line 55-58: The builder-style method Params::with_seed currently
returns Self but lacks the #[must_use] attribute; add #[must_use] directly above
the pub fn with_seed signature so callers are warned if they discard the
returned updated Params, leaving the method otherwise unchanged (keep the
signature, body, and visibility the same).

In `@src/table/writer/filter/full.rs`:
- Around line 91-93: The early-empty check on bloom_hash_buffer currently logs
"not building filter" but then the function later returns Ok(1); change the
control flow so the "no hashes registered" path immediately returns Ok(0)
(matching the filter_bytes.is_empty() path) instead of falling through. Update
the branch that checks bloom_hash_buffer.is_empty() and the similar empty-filter
path around the later conditional (both places mentioned) to return Ok(0) when
no filter is written so the return value accurately reflects zero bytes/blocks
written.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c86539a-552b-4b13-ade0-843000dfecc5

📥 Commits

Reviewing files that changed from the base of the PR and between 320aeb0 and 8fc62c8.

📒 Files selected for processing (12)
  • Cargo.toml
  • benches/bloom.rs
  • src/table/filter/mod.rs
  • src/table/filter/ribbon/burr/filter.rs
  • src/table/filter/ribbon/burr/params.rs
  • src/table/filter/ribbon/burr/tests.rs
  • src/table/filter/ribbon/burr/threshold.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/mod.rs
  • src/table/filter/ribbon/params.rs
  • src/table/mod.rs
  • src/table/writer/filter/full.rs

Comment thread src/table/filter/mod.rs Outdated
Comment thread src/table/filter/ribbon/burr/tests.rs Outdated
Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Comment thread src/table/filter/ribbon/params.rs
Comment thread src/table/writer/filter/full.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 5 comments.

Comment thread src/table/writer/filter/partitioned.rs
Comment thread src/table/writer/filter/full.rs Outdated
Comment thread src/table/filter/mod.rs Outdated
Comment thread src/table/filter/ribbon/mod.rs Outdated
Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/table/filter/ribbon/burr/wire.rs (1)

83-97: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace new #[allow(clippy::...)] suppressions with #[expect(..., reason = "...")].

These are new suppressions in PR-added Rust code; please switch them to #[expect] so stale suppressions surface automatically.

Suggested diff
-    #[allow(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")]
+    #[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")]
     {
@@
-        #[allow(
+        #[expect(
             clippy::cast_possible_truncation,
             reason = "max_layers fits u8 by construction"
         )]
         let num_layers_u8 = layers.len() as u8;
@@
-        #[allow(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")]
+        #[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")]
         {
@@
-            #[allow(clippy::expect_used, reason = "len pre-validated by header check")]
+            #[expect(clippy::expect_used, reason = "len pre-validated by header check")]
             let arr: [u8; 8] = chunk.try_into().expect("chunks_exact yields 8-byte slices");

As per coding guidelines, "Prefer #[expect(lint)] over #[allow(lint)]#[expect] warns when suppression becomes unnecessary".

Also applies to: 119-126, 253-255

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/table/filter/ribbon/burr/wire.rs` around lines 83 - 97, Replace the new
clippy suppression attributes with expectation attributes: change the
block-level #[allow(clippy::expect_used, reason = "...")] to
#[expect(clippy::expect_used, reason = "writing to a Vec<u8> cannot fail")] and
similarly change the cast suppression around num_layers_u8 from
#[allow(clippy::cast_possible_truncation, reason = "max_layers fits u8 by
construction")] to #[expect(clippy::cast_possible_truncation, reason =
"max_layers fits u8 by construction")]; apply the same pattern to the other
occurrences noted (around the u8 casts at lines ~119-126 and ~253-255) so the
clippy suppressions for buf.write_u8 / num_layers_u8 / params seed writes become
#[expect(..., reason = "...")] instead of #[allow(...)].
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benches/bloom.rs`:
- Around line 43-66: The probe benchmarks need percentile-oriented latency
reporting (p99/p999) rather than only mean throughput; update the probe
benchmark functions (notably the ones around burr_filter_construction and the
probe benches in the 68-127 and 129-165 ranges) to use a Criterion
BenchmarkGroup instead of plain c.bench_function, configure the group with an
appropriate sample_size and measurement_time so Criterion will collect latency
samples, and register the probe benchmark via group.bench_function or
group.bench_with_input so the collected samples can be used to generate
percentile stats (p99/p999) in Criterion’s report; keep the build bench
(burr_filter_construction and BurrBuilder::build_from_hashes) unchanged and
apply this BenchmarkGroup pattern to each probe path function.
- Around line 140-141: Replace the local suppression attribute on the expression
that computes r (the line containing "let r = (-fpr.log2()).ceil() as usize;")
so that the current #[allow(clippy::cast_possible_truncation,
clippy::cast_sign_loss)] becomes #[expect(clippy::cast_possible_truncation,
reason = "ceil() to usize is intentional and reviewed")] and
#[expect(clippy::cast_sign_loss, reason = "ceil() to usize is intentional and
reviewed")], i.e., use #[expect(..., reason = "...")] with the two specific lint
names instead of #[allow(...)] to ensure the suppression will warn if it becomes
unnecessary.

---

Duplicate comments:
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 83-97: Replace the new clippy suppression attributes with
expectation attributes: change the block-level #[allow(clippy::expect_used,
reason = "...")] to #[expect(clippy::expect_used, reason = "writing to a Vec<u8>
cannot fail")] and similarly change the cast suppression around num_layers_u8
from #[allow(clippy::cast_possible_truncation, reason = "max_layers fits u8 by
construction")] to #[expect(clippy::cast_possible_truncation, reason =
"max_layers fits u8 by construction")]; apply the same pattern to the other
occurrences noted (around the u8 casts at lines ~119-126 and ~253-255) so the
clippy suppressions for buf.write_u8 / num_layers_u8 / params seed writes become
#[expect(..., reason = "...")] instead of #[allow(...)].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1e7c573-f490-4292-ba80-92f74e0554eb

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc62c8 and 7fb54b7.

📒 Files selected for processing (5)
  • benches/bloom.rs
  • src/table/filter/ribbon/burr/filter.rs
  • src/table/filter/ribbon/burr/threshold.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/hashing.rs

Comment thread benches/bloom.rs
Comment thread benches/bloom.rs Outdated
polaz added a commit that referenced this pull request May 19, 2026
Addresses round-2 review feedback on PR #269.

# Correctness fixes

src/table/filter/mod.rs:
  - is_active now delegates to burr_params(1).is_some() — exact
    equivalence with 'would this policy produce a non-empty filter'.
    Closes the case where 0 < fpr < 2^-64 reported is_active() == true
    but burr_params returned None, causing the writer to buffer hashes
    then drop them at the spill.

src/table/writer/filter/full.rs:
  - finish() now returns Ok(0) on the empty bloom_hash_buffer path
    (was Ok(1)). The reported block_count#filter now matches the
    number of filter sections actually written. Restructured the
    flow to early-return on empty buffer instead of an else block,
    so the Ok(1) tail only fires when a filter block was written.

src/table/writer/filter/partitioned.rs:
  - spill_filter_partition now skips the Block::write_into when
    build_burr_filter_bytes returns an empty Vec. Persisting an empty
    filter block would cause BurrFilterReader::new to error at read
    time. The TLI entry is also skipped so the reader sees no filter
    for this range.

# Lint-policy migrations

  src/table/filter/ribbon/burr/wire.rs   — allow → expect on the
    encode/header expect blocks + the num_layers as u8 cast.
  src/table/filter/ribbon/burr/tests.rs  — three cast_precision_loss
    suppressions: allow → expect with concise reasons.
  src/table/filter/ribbon/params.rs       — Params::with_seed marked
    must_use (builder-style method).
  benches/bloom.rs                        — cast_possible_truncation
    + cast_sign_loss on the bench's local r computation switched to
    paired expect attributes.

# Vendored allow propagation

  src/table/filter/ribbon/burr/mod.rs comment expanded: the parent
  ribbon::* allow list propagates into burr/, which means
  expect/unwrap/indexing-slicing can be written in BuRR code despite
  the crate-level deny. Re-denying here would require migrating ~30
  internal sites in builder.rs / wire.rs / threshold.rs / filter.rs
  / tests.rs — tracked separately as a follow-up (issue #270). New
  BuRR code in THIS PR uses per-site expect with reason for new
  suppressions.

# Verification

  nextest --lib table::filter: 67 tests pass
  full-target lint pass (-D warnings): clean
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/table/filter/ribbon/burr/wire.rs`:
- Around line 327-331: The current probe path silently substitutes [0;8] when
the slice or conversion fails (using unwrap_or), which can mask data corruption;
instead, change the code around z.get(row_byte..row_byte + 8).and_then(|s|
s.try_into().ok()) so that if it yields None you immediately return true (fail
closed) from the probe function, otherwise bind the resulting [u8;8] to arr and
continue with acc ^= u64::from_le_bytes(arr); ensure you return from the same
function that contains acc and row_byte (the probe path where acc ^=
u64::from_le_bytes(arr) is located) so the impossible path no longer falls back
to zeros.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec0d18f1-9e2c-4bcc-a77f-7cf7234049fa

📥 Commits

Reviewing files that changed from the base of the PR and between 7fb54b7 and 2263bac.

📒 Files selected for processing (9)
  • benches/bloom.rs
  • src/table/filter/mod.rs
  • src/table/filter/ribbon/burr/mod.rs
  • src/table/filter/ribbon/burr/tests.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/mod.rs
  • src/table/filter/ribbon/params.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs

Comment thread src/table/filter/ribbon/burr/wire.rs Outdated
Copilot AI review requested due to automatic review settings May 19, 2026 10:54
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 2 comments.

Comment thread src/table/writer/filter/partitioned.rs Outdated
Comment thread tests/burr_filter_end_to_end.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benches/bloom.rs`:
- Around line 135-154: The Ribbon benchmark currently builds the filter with n =
1_000_000 but only supplies keys: Vec<u64> = (0..100_000).collect(), causing a
lower effective load than BuRR; update the benchmark so the keys vector length
matches the intended n (or otherwise pad/expand to n unique keys) before calling
Params::new and building the filter (affecting symbols keys, n, Params::new,
RibbonBuilder::new, builder.build, filter.new_scratch) so the Ribbon baseline is
sized and populated at the same effective load as BuRR.

In `@src/table/filter/mod.rs`:
- Around line 74-87: estimated_filter_size() can return a positive size for
policies that are inactive according to burr_params(n); change it to mirror
build behavior by returning 0 when burr_params(n) is None. Update
estimated_filter_size to call or check the same activation logic used by
burr_params(n) (or call burr_params(n) directly) and early-return 0 for inactive
policies, then compute r_bits and the size only when the policy is active
(preserving the existing clamp and byte conversion logic).

In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 169-176: Update the test burr_wire_rejects_bad_magic to start from
a valid serialized filter, then mutate only the magic bytes to trigger the
magic-check path; specifically, construct a bona fide wire payload (e.g., by
creating a valid Burr filter via the same helpers used elsewhere or by
serializing a known-good Burr using the code paths under test), then flip
bytes[0]/bytes[1] (the magic) and call BurrFilterReader::new(&bytes) to assert
it returns an Err; ensure you reference the BurrFilterReader::new constructor
and keep mutation limited to the magic bytes so other validation steps do not
fail first.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53a20187-e398-45a8-b066-d533b57efe16

📥 Commits

Reviewing files that changed from the base of the PR and between 7fb54b7 and 225fabc.

📒 Files selected for processing (11)
  • Cargo.toml
  • benches/bloom.rs
  • src/table/filter/mod.rs
  • src/table/filter/ribbon/burr/mod.rs
  • src/table/filter/ribbon/burr/tests.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/mod.rs
  • src/table/filter/ribbon/params.rs
  • src/table/filter/ribbon/tests.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs

Comment thread benches/bloom.rs Outdated
Comment thread src/table/filter/mod.rs
Comment thread src/table/filter/ribbon/burr/tests.rs
polaz added a commit that referenced this pull request May 19, 2026
Round-3 review fixes on PR #269.

src/table/writer/filter/partitioned.rs:
  spill_filter_partition: empty filter_bytes now fails closed with
  Error::Unrecoverable instead of silently skipping the partition.
  Silently skipping caused false negatives at read time: keys in the
  skipped range would binary-search to a later partition's filter
  (which does not contain them), and Table::check_bloom would report
  'definitely not present' on a live key. The path is unreachable in
  practice (is_active is checked upstream before keys are buffered)
  but defence-in-depth.

tests/burr_filter_end_to_end.rs:
  Header comment used 'false negatives' wrong (false negatives apply
  to inserted keys, not unknown keys). Rewritten to describe the two
  invariants the test actually checks: (a) inserted keys must resolve
  via tree.get — no false negatives, (b) unknown keys must never
  return Some — false positives only ever cost a wasted index lookup.

benches/bloom.rs:
  ribbon_filter_contains now pads its key set to n with random fillers
  (same shape as the BuRR bench above), so the Ribbon body is loaded
  to the same density. Without this, Ribbon was running at ~10% load
  vs BuRR's ~70-90% — non-equivalent probe-latency comparison.

src/table/filter/mod.rs:
  estimated_filter_size now delegates to burr_params for the n check
  and pulls r directly from the constructed params. Estimate is 0
  exactly when the builder would also produce empty bytes — keeps the
  memory accounting in sync with build behavior. Previously the
  estimator clamped raw inputs and could overestimate for inactive
  policies.

src/table/filter/ribbon/burr/tests.rs:
  burr_wire_rejects_bad_magic now starts from a valid wire payload
  and flips the first magic byte. The previous test built a buffer
  of zeros — that buffer fails decode regardless of the magic check
  (e.g. on the version byte), so it did not actually prove the magic
  check triggers.
Copilot AI review requested due to automatic review settings May 19, 2026 11:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 3 comments.

Comment thread src/table/writer/filter/partitioned.rs
Comment thread src/table/filter/ribbon/burr/builder.rs
Comment thread src/table/filter/ribbon/burr/filter.rs
Copilot AI review requested due to automatic review settings May 19, 2026 12:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 3 comments.

Comment thread src/table/filter/block.rs
Comment thread src/table/filter/ribbon/params.rs Outdated
Comment thread src/table/filter/ribbon/_vendored/LICENSE-MIT Outdated
polaz added a commit that referenced this pull request May 19, 2026
Wire-format break: filter blocks are no longer Bloom-encoded — V4
binaries cannot read V5 tables and vice versa. The filter block
decoder rejects the wrong magic at open time. Manifest version bump
makes the contract explicit:

src/format_version.rs:
  +V5 variant. Doc-comment explains the wire-format break and the
  cross-version read incompatibility.

src/manifest.rs / src/version/mod.rs:
  Writers emit V5; manifest deserialiser assert pins V5 expectation
  in the round-trip test.

src/tree/mod.rs:
  Tree::open accepts only V5 manifests. V3/V4 manifests now error
  with InvalidVersion at recovery time rather than reaching the
  filter decoder with mismatched bytes.

tests/tree_recovery_versions.rs:
  Renamed tree_writes_v4_manifest → tree_writes_v5_manifest, added
  tree_rejects_pre_v5_manifest asserting V4 → InvalidVersion(4),
  replaced the V3-safe-recovery test (V3 is no longer a safe
  fall-back under V5).

# zstd-pure alias dropped

The 'zstd-pure' feature flag was a deprecated alias for 'zstd' that
shipped while both a C-backed and pure-Rust implementation were on
the roadmap. Only one zstd backend now exists (via structured-zstd),
so the alias serves no purpose:

Cargo.toml:
  Removed 'zstd-pure = ["zstd"]' feature entry.

src/compression/zstd_pure.rs → src/compression/zstd_backend.rs:
  Renamed the file. Inside, ZstdPureProvider → ZstdProvider. No
  behaviour change.

src/compression/mod.rs:
  mod zstd_pure; → mod zstd_backend;. ZstdBackend type alias points
  at the renamed provider.

README.md:
  Feature-flag table no longer mentions zstd-pure. zstd description
  no longer hints at a 'pure Rust' alternative existing — it IS the
  zstd backend.

# Other review-cycle fixes carried in this commit

PR #269 round-N inline threads:

- Hot-path probe alloc-free entry point: new
  burr::contains_hash_from_bytes(bytes, hash) single-pass parse +
  probe. FilterBlock::maybe_contains_hash uses it instead of
  BurrFilterReader::new(...).contains_hash(...), avoiding the
  Vec<LayerView> heap allocation per probe on Table::check_bloom.

- src/table/filter/ribbon/params.rs preamble comment updated:
  ribbon-serde feature wires only 'dep:serde' now (bitvec dep was
  dropped for cross-arch portability in an earlier commit; the
  comment still referenced 'bitvec/serde').

- src/table/filter/ribbon/_vendored/LICENSE-MIT: copyright line
  corrected from '(c) 2026 ribbon-filter contributors' to '(c)
  William Rågstad and ribbon-filter contributors' to match the
  attribution in ribbon/mod.rs.

# Package + docs.rs polish

README rewritten as a flat factual description of the engine:
features grouped by read / write / compaction / storage / concurrency,
limit numbers, feature-flag table. Dropped the upstream-fjall logo
+ logo-image references. Crate-level doc in lib.rs aligned with the
README (sans the donation block).

Cargo.toml:
  description rewritten in factual terms (was K.I.S.S./fork branding,
  is now the actual feature surface).
  documentation = 'https://docs.rs/coordinode-lsm-tree' explicit.
  keywords + categories tightened ('embedded', 'filesystem',
  'compression' replace 'coordinode').
  [package.metadata.docs.rs] block added: all-features build, cfg(docsrs)
  rustdoc-args so feature/availability badges render, x86_64-unknown-linux-gnu
  target.

# Coverage uplift

Vendored ribbon error Display impls (ParamError variants,
ConstructionFailure InconsistentEquation + OutOfBounds, BuildError
variants, FilterReprError variants under ribbon-serde feature) get
explicit display-format tests covering all previously-uncovered
arms. ~50 new uncovered lines in src/table/filter/ribbon/error.rs
now exercised.

# Verification

  cargo fmt: clean
  cargo clippy --all-features --all-targets --lib --tests -- -D warnings: clean
  cargo nextest run: 1246 pass (was 1241, +5)
Copilot AI review requested due to automatic review settings May 19, 2026 13:02
polaz added 2 commits May 19, 2026 20:00
…city

- burr::wire: layer-header bounds checks now compute `header_end` via
  `pos.checked_add(LAYER_HEADER_LEN)` and bail to InvalidHeader on
  overflow, matching the existing checked-arithmetic pattern used for
  `thresholds_end` and `z_end`. Applied at both occurrences (`decode`
  and `contains_hash_from_bytes`). On 32-bit targets a corrupted `pos`
  could otherwise wrap past `bytes.len()` and let the bounds guard
  succeed by wraparound, then panic at the slice indexing.
- writer/filter/partitioned: replace `mem::take(&mut bloom_hash_buffer)`
  with `mem::replace(..., Vec::with_capacity(old_cap))` so the buffer
  retains its grown capacity across partition spills. `take` leaves a
  capacity-0 Vec behind, forcing a reallocation on every register_key
  after each spill — measurable overhead on tables that spill many
  partitions per flush/compaction.

No regression tests added: the wire-overflow path requires a 32-bit
target to actually trigger (the unchecked add does not wrap on 64-bit
hosts), and the partitioned-writer change is a perf improvement with
no behavioural delta.
…ts mod

The regression test added in the preceding commits uses `matches!` and
explicit `Ok(_)` patterns rather than `.unwrap()`, so the
`#[expect(clippy::unwrap_used)]` attribute on the `mod tests` block
has nothing to fulfil. Under the workspace lint config
(`-D warnings -D unfulfilled-lint-expectations` in the zstd-only CI
matrix) the unfulfilled expectation becomes a hard compile error.

Drop the attribute; future test additions can re-introduce it if they
actually use `.unwrap()`.
Copilot AI review requested due to automatic review settings May 19, 2026 17:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 171 out of 172 changed files in this pull request and generated 3 comments.

Comment thread src/table/filter/ribbon/burr/params.rs Outdated
Comment thread src/lib.rs
Comment thread README.md Outdated
polaz added 4 commits May 19, 2026 20:18
Covers all `ParamError` / `ConstructionFailure` / `BuildError` /
`FilterReprError` variants — including the new
`ConstructionFailure::StorageLengthOverflow` introduced earlier on
this branch, the back-substitution branch of `OutOfBounds` (no
key_index), and the `BuildError::InvalidParams` arm which previously
had no Display assertion.

Bumps patch coverage for `src/table/filter/ribbon/error.rs` past the
codecov gate (was 94.59%, all uncovered lines were Display arms).
- burr::BurrBuilder::new now rejects params with `b < w` up-front with
  `InvalidParams("b must be >= w")`. `BurrParams::layer_m` floors its
  return at `b` and rounds up to a multiple of `b`; the vendored
  Ribbon `Params::new(m, w=64, ...)` separately requires `m >= w`. If
  a caller hand-builds `BurrParams` with `b < w` (= 64), `layer_m`
  could otherwise produce an `m` between `b` and `w-1` that Ribbon
  rejects with a generic "vendored ribbon param error" hiding the
  real invariant.
- lib.rs crate docs: the range-tombstone bullet referenced "V4 disk
  format"; this branch makes V5 the current/only readable format and
  rejects V3/V4 manifests at `Tree::open`. Update to "V5 format
  introduced for the BuRR filter".
- README License section: clarify that only first-party `.rs` files
  carry the SPDX-per-file header. The vendored Ribbon module keeps
  its upstream layout (module-level licensing commentary + preserved
  `_vendored/LICENSE-*` texts), so a blanket "every .rs file" claim
  was inaccurate.
- error.rs tests: switch single-character `.contains("x")` to char
  patterns to satisfy `clippy::single_char_pattern` under the
  zstd-only CI matrix's `-D warnings` profile.
Reviewer correctly flagged that `BurrParams::layer_m` only guarantees
its return is `>= b`, while the vendored Ribbon `Params::new`
additionally requires `m >= w`. The runtime enforcement now lives in
`BurrBuilder::new` (preceding commit, rejects `b < w`). Add a
docstring paragraph here cross-referencing that gate so a future
reader of `layer_m` understands why no per-call `m >= w` check is
needed.
- Add full unit test module for `Params` (previously had none): covers
  every `ParamError` arm reachable from `Params::new`, the
  `with_seed` / `with_retry_limit` / `with_retry_policy` chain, the
  `r_from_fpr` / `from_expected_items` constructors including their
  error paths, and the `start_range` / `fingerprint_words` /
  `fingerprint_last_word_mask` accessors.
- Add `BurrBuilder::new` validation tests for every reject branch:
  zero `n`, `r` out of `[1, 64]`, `w != 64`, zero `b`, the newly added
  `b < w` gate, and zero `max_layers`. Plus a Debug-formatter
  smoke-test for `BurrBuilder` and `BurrFilter` so the `impl Debug`
  arms aren't silently uncovered.

Pure coverage / no-behavior commit; nudges codecov patch% closer to
the project-coverage gate.
Copilot AI review requested due to automatic review settings May 19, 2026 17:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 171 out of 172 changed files in this pull request and generated 1 comment.

Comment thread tests/tree_recovery_versions.rs Outdated
The doc-comment on `tree_rejects_pre_v5_manifest` documents that V5
must reject BOTH V3 and V4 manifests (the wire-format break
introduced for the BuRR filter), but the test body only rewrote the
on-disk manifest to V4 and asserted that single boundary. Wrap the
test in a `for pre_v5 in [3, 4]` loop so V3 and V4 are both exercised
end-to-end (open → flush → rewrite manifest → reopen → expect
InvalidVersion(pre_v5)). A future relaxation of the boundary —
accepting V4 but not V3, say — now lights up the test rather than
quietly passing.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Around line 71-73: Update the README lines that document key/value size
limits: change "up to 65 536 bytes" to "up to 65,535 bytes (2^16 − 1)" and
change "up to 2³² bytes" to "up to 4,294,967,295 bytes (2^32 − 1)" so the
documented maxima match the u16/u32 encoded bounds used by the codebase; ensure
the wording around proportional performance cost remains unchanged.

In `@src/table/filter/ribbon/burr/params.rs`:
- Around line 147-151: The rounding expression raw.div_ceil(b) * b can overflow;
compute blocks = raw.div_ceil(b) first and then use a checked multiplication
(e.g., blocks.checked_mul(b)) to detect overflow instead of letting release-mode
wrap; handle the failure path (return an error, saturate to usize::MAX, or panic
with a clear message) so layer_m (the rounded value computed from raw, b,
layer_input_keys and self.b) never silently wraps.

In `@src/table/filter/ribbon/burr/tests.rs`:
- Around line 286-313: Add a new test (or extend the existing
burr_negative_keys_obey_fpr_envelope_at_low_target) to also verify the
very-tight FPR envelope at target 0.0001: create params with
BurrParams::with_fp_rate(n, 0.0001), build the filter via BurrBuilder::new(...)
and builder.build_from_hashes(...), probe the same probe_count range using
crate::hash::hash64 and filter.contains_hash, compute the realised fpr and
assert it stays within an appropriate envelope (similar style to the existing
0.001 case but with a stricter threshold for 0.0001).
- Around line 633-657: Update the test
contains_hash_from_bytes_round_trips_against_decoded to also iterate a disjoint
set of absent hashes and assert the single-pass contains_hash_from_bytes result
equals BurrFilterReader::contains_hash for each absent hash (mirroring the
present-hash loop), and similarly update the other test around lines 768-793 to
compare both present and absent corpora; reference the test function name
contains_hash_from_bytes_round_trips_against_decoded and the functions
contains_hash_from_bytes and BurrFilterReader::contains_hash so the absent-path
equivalence is verified rather than only sanity-checked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0931a7af-3ae5-40a5-942b-6a1cc8a3bf5e

📥 Commits

Reviewing files that changed from the base of the PR and between dcdab54 and 9d50308.

📒 Files selected for processing (153)
  • Cargo.toml
  • README.md
  • src/abstract_tree.rs
  • src/active_tombstone_set.rs
  • src/any_tree.rs
  • src/blob_tree/gc.rs
  • src/blob_tree/handle.rs
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/cache.rs
  • src/checksum.rs
  • src/coding.rs
  • src/compaction/drop_range.rs
  • src/compaction/fifo.rs
  • src/compaction/flavour.rs
  • src/compaction/leveled/mod.rs
  • src/compaction/maintenance.rs
  • src/compaction/major.rs
  • src/compaction/mod.rs
  • src/compaction/movedown.rs
  • src/compaction/pulldown.rs
  • src/compaction/state/hidden_set.rs
  • src/compaction/state/mod.rs
  • src/compaction/stream.rs
  • src/compaction/tiered/mod.rs
  • src/compaction/worker.rs
  • src/comparator.rs
  • src/compression/mod.rs
  • src/compression/zstd_backend.rs
  • src/config/block_size.rs
  • src/config/compression.rs
  • src/config/filter.rs
  • src/config/hash_ratio.rs
  • src/config/mod.rs
  • src/config/pinning.rs
  • src/config/restart_interval.rs
  • src/descriptor_table.rs
  • src/double_ended_peekable.rs
  • src/encryption.rs
  • src/error.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/format_version.rs
  • src/fs/io_uring_fs.rs
  • src/fs/mem_fs.rs
  • src/fs/mod.rs
  • src/fs/std_fs.rs
  • src/heap.rs
  • src/ingestion.rs
  • src/key.rs
  • src/key_range.rs
  • src/lib.rs
  • src/manifest.rs
  • src/memtable/arena.rs
  • src/memtable/interval_tree.rs
  • src/memtable/mod.rs
  • src/memtable/skiplist.rs
  • src/memtable/value_store.rs
  • src/merge.rs
  • src/merge_operator.rs
  • src/metrics.rs
  • src/mvcc_stream.rs
  • src/path.rs
  • src/pinnable_slice.rs
  • src/prefix.rs
  • src/range.rs
  • src/range_tombstone.rs
  • src/range_tombstone_filter.rs
  • src/run_reader.rs
  • src/run_scanner.rs
  • src/seqno.rs
  • src/slice/mod.rs
  • src/slice/slice_bytes/mod.rs
  • src/slice/slice_default/mod.rs
  • src/slice_windows.rs
  • src/stop_signal.rs
  • src/table/block/binary_index/builder.rs
  • src/table/block/binary_index/mod.rs
  • src/table/block/binary_index/reader.rs
  • src/table/block/decoder.rs
  • src/table/block/encoder.rs
  • src/table/block/hash_index/builder.rs
  • src/table/block/hash_index/mod.rs
  • src/table/block/hash_index/reader.rs
  • src/table/block/header.rs
  • src/table/block/mod.rs
  • src/table/block/offset.rs
  • src/table/block/trailer.rs
  • src/table/block/type.rs
  • src/table/block_index/full.rs
  • src/table/block_index/iter.rs
  • src/table/block_index/mod.rs
  • src/table/block_index/two_level.rs
  • src/table/block_index/volatile.rs
  • src/table/data_block/iter.rs
  • src/table/data_block/mod.rs
  • src/table/filter/block.rs
  • src/table/filter/mod.rs
  • src/table/filter/ribbon/builder.rs
  • src/table/filter/ribbon/burr/builder.rs
  • src/table/filter/ribbon/burr/filter.rs
  • src/table/filter/ribbon/burr/params.rs
  • src/table/filter/ribbon/burr/tests.rs
  • src/table/filter/ribbon/burr/wire.rs
  • src/table/filter/ribbon/error.rs
  • src/table/filter/ribbon/params.rs
  • src/table/id.rs
  • src/table/index_block/block_handle.rs
  • src/table/index_block/iter.rs
  • src/table/index_block/mod.rs
  • src/table/inner.rs
  • src/table/iter.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/regions.rs
  • src/table/scanner.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/mod.rs
  • src/table/writer/filter/partitioned.rs
  • src/table/writer/index/full.rs
  • src/table/writer/index/mod.rs
  • src/table/writer/index/partitioned.rs
  • src/table/writer/meta.rs
  • src/table/writer/mod.rs
  • src/time.rs
  • src/tree/ingest.rs
  • src/tree/inner.rs
  • src/tree/mod.rs
  • src/tree/sealed.rs
  • src/util.rs
  • src/value.rs
  • src/value_type.rs
  • src/verify.rs
  • src/version/mod.rs
  • src/version/optimize.rs
  • src/version/recovery.rs
  • src/version/run.rs
  • src/version/super_version.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/merge.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/mod.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/scanner.rs
  • src/vlog/blob_file/writer.rs
  • src/vlog/handle.rs
  • src/vlog/mod.rs
  • src/write_batch.rs
  • tests/tree_recovery_versions.rs

Comment thread README.md Outdated
Comment thread src/table/filter/ribbon/burr/params.rs Outdated
Comment thread src/table/filter/ribbon/burr/tests.rs
Comment thread src/table/filter/ribbon/burr/tests.rs
polaz added 2 commits May 19, 2026 21:17
Exercise the four `RibbonBuilder` entry points BuRR uses but the
existing tests only hit transitively through full BuRR builds:

- `build_with_seed_verbatim` happy-path with generous m (single
  attempt, fixed seed lands), persistence of caller-supplied seed.
- `build_with_seed_verbatim` failure-path with tight m → asserts
  `attempts == 1` (no retry budget on verbatim).
- `build_with_seed_verbatim_from_hashes` round-trip with xxh3 hashes
  (the actual LSM-side path) + failure variant.
- `build` in Homogeneous mode: must break out of the inner retry loop
  after a single attempt and must NOT grow `m` between outer iterations
  (Homogeneous has no retry/grow semantics — the algorithm assigns
  random fingerprints to unoccupied rows in a single shot).

Pure coverage / no-behavior commit; targets uncovered branches in
`src/table/filter/ribbon/builder.rs` flagged by the codecov patch
report.
… tests

- README + crate-level docs claimed "up to 65 536 bytes" for keys and
  "up to 2³² bytes" for values; both were off-by-one because the
  on-disk encoding caps the length fields at u16 / u32. Documented
  values are now 65,535 (`u16::MAX`) and 4,294,967,295 (`u32::MAX`,
  `2³² − 1`) with the length-field type spelled out next to each.
- burr::BurrParams::layer_m: replace `raw.div_ceil(b) * b` (which can
  wrap in release builds on large inputs and silently yield a SMALLER
  m than the caller asked for) with checked_mul + saturation to the
  largest usize-fitting multiple of b. The downstream
  `RibbonBuilder::build_once` then surfaces the oversize cleanly via
  `ConstructionFailure::StorageLengthOverflow` instead of allocating
  an undersized filter.
- Tests:
  * Add `burr_negative_keys_obey_fpr_envelope_at_very_low_target` —
    the tightest FPR target (0.0001) called out in the acceptance
    criteria was previously unverified; a regression there would have
    merged unnoticed. Realised FPR over 50k disjoint probes must stay
    below 0.1% (10× envelope around the 0.01% target).
  * Extend `contains_hash_from_bytes_round_trips_against_decoded` and
    `contains_hash_from_bytes_returns_false_for_non_inserted` with a
    disjoint absent-hash corpus, asserting exact equality between the
    single-pass `contains_hash_from_bytes` entry point and the
    decoded-reader `BurrFilterReader::contains_hash` on both present
    AND absent inputs. The two are separate implementations; a
    mismatch on the not-found path would otherwise pass silently.
Copilot AI review requested due to automatic review settings May 19, 2026 18:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 171 out of 172 changed files in this pull request and generated 1 comment.

Comment thread src/lib.rs Outdated
The crate-level docs implied range tombstones were introduced together
with the V5 disk format. They were not: the SST-side range-tombstone
encoding landed in V4. V5's breaking change is the filter wire format
(BuRR replaces Bloom), nothing to do with tombstones.

Reword the bullet so the on-disk history is accurate and the V4 →
V5 distinction stays clear.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 171 out of 172 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 4239a3d into main May 19, 2026
24 of 26 checks passed
@polaz polaz deleted the feat/#268 branch May 19, 2026 18:46
polaz added a commit that referenced this pull request May 20, 2026
BREAKING CHANGE: on-disk format bumped to V5. V3/V4 databases are NOT
readable by this version and vice versa.

V5 introduces the BuRR filter wire-format break (replaces standard
bloom) shipped in #269 and the PITR hard-link checkpoint primitive
shipped in #210. Both touch the manifest layout / filter block
encoding and are not backward-compatible at the on-disk level.

Squash-merge marker for release-plz: this PR is the major bump for
the V5 cycle. The Cargo.toml version stays at 4.5.0; release-plz
picks up the BREAKING CHANGE footer on squash and opens the v5.0.0
release PR.

Drive-by no-std and CI fixes bundled here:

- replace interval-heap (transitive dep `compare v0.0.6` doesn't
  declare `#![no_std]`, blocking the no-std-check CI job) with
  `alloc::collections::BinaryHeap<Reverse<_>>`. The blob-file
  merger only needs pop-min / push, no double-ended interval semantics
  — switching to BinaryHeap removes a 10-year-old transitive dep.

- src/vlog/blob_file/merge.rs uses `core::*` / `alloc::*` for the
  types previously imported through `std::*` re-exports (Ordering,
  BinaryHeap, Vec). Pure label change for std targets, unblocks
  no-std for this module.

- tests/checkpoint_pitr.rs concurrent_writes watermark assertion uses
  `i < info.seqno` (the actual visible_seqno contract: lowest excluded
  seqno) rather than `<=`. The `<=` form raced on cross-target runners
  (aarch64/riscv64/musl/etc) because writer's `insert(seqno=i)` and
  `fetch_max(i + 1)` aren't atomic — main could observe
  visible_seqno = i between those steps, then assert k(i) present even
  though the watermark contract excludes it. Strict `<` matches the
  semantic and stops the cross-target flakes.

- .github/instructions/rust.instructions.md gains a no-std-direction
  section with 20 rules split into suggest / reject / always-applies
  buckets. Per-crate tier tables are NOT required; reviewers should
  guide toward core/alloc primitives, not gatekeep.

Closes #210
polaz added a commit that referenced this pull request May 20, 2026
)

## Summary

Implements
[#210](#210)
— the consistent hard-link checkpoint primitive that unblocks
point-in-time recovery (PITR) backup for CoordiNode and any other
consumer that needs an O(1)-per-file snapshot of an open LSM-tree.

- New trait method `Fs::hard_link(src, dst)` with a default
`Unsupported` impl (non-breaking for downstream backends). `StdFs`
performs a real `hard_link(2)` with an EXDEV→copy fallback; `MemFs`
produces an independent copy; `IoUringFs` delegates to `StdFs` since
linking is a metadata-only syscall.
- New trait method `Fs::backend_id() -> Option<u64>` — explicit
shared-namespace capability check that gates cross-`Fs` hard-link
attempts. `StdFs`/`IoUringFs` return a common `KERNEL_BACKEND_ID`; each
`MemFs::new()` allocates a unique per-instance id; default impl returns
`None` for safe-by-default third-party backends.
- `DeletionPause` — a tree-wide refcount-gated deferred-delete queue
installed into each `Table::Inner` / `BlobFile::Inner`. While at least
one `Pause` guard is alive, both `Drop` impls queue removals instead of
executing them; the queue drains when the last guard is dropped. Mirrors
RocksDB's `DisableFileDeletions`/`EnableFileDeletions` pattern.
Synchronisation uses `spin::Mutex` so the type is `no_std`-compatible.
- `AbstractTree::create_checkpoint(target) -> CheckpointInfo` —
implemented on `Tree`, `BlobTree`, and (via `enum_dispatch`) `AnyTree`.
The driver captures `visible_seqno` first, flushes the active memtable
(eviction seqno `0` — never expands MVCC GC beyond what would have
happened anyway), snapshots the current version, hard-links every live
SST + blob into `target/{tables,blobs}/`, copies manifest / `v<id>` /
`current`, and fsyncs the target root.
- `level_routes` (tiered storage) is honoured automatically — each
`Table::Inner` already carries its own routed `Fs`, so the link is
performed through the right backend. Cross-`Fs` situations (e.g. `MemFs`
source vs. `StdFs` target) transparently fall back to a streamed byte
copy after the `backend_id` namespace check declines the hard-link path.
- `Tree::open` now rejects a directory that has version artifacts
(`tables/`, `blobs/`, `vN`) but no `current` pointer — the on-disk
signature of a half-written checkpoint. Previously the code silently
called `create_new`, which would have overwritten the partial state with
an empty tree.

## Test plan

`cargo nextest run --all-features` — **1327/1327 passed** (6
environment-skipped), including 14 integration scenarios in
`tests/checkpoint_pitr.rs`:

- [x] Round-trip — checkpoint + reopen as standalone tree reads every
key back
- [x] Concurrent writes during checkpoint do not corrupt the snapshot
- [x] Deferred-delete invariant: concurrent `major_compact` cannot
remove an SST captured by an in-flight checkpoint
- [x] BlobTree checkpoint captures both index SSTs and blob files
- [x] `level_routes` SSTs land in the flattened `target/tables/`
directory
- [x] Source vs. checkpoint isolation — writes on either side do not
bleed across
- [x] Crash-safety — failed checkpoint leaves the source tree fully
reopenable (chmod-driven `link_tables` failure)
- [x] Empty tree checkpoint succeeds with `sst_files == 0`
- [x] `CheckpointInfo.total_bytes` matches the on-disk sum
- [x] Re-running a checkpoint into the same target is rejected with
`AlreadyExists`
- [x] MVCC regression — checkpoint-triggered flush must NOT drop history
needed by source-tree snapshot readers (eviction seqno = 0, not
`SeqNo::MAX`)
- [x] Half-written-checkpoint detection — missing `current` pointer
rejected by `Tree::open`
- [x] Half-written-checkpoint detection — corrupt `current` pointer
rejected by `Tree::open`
- [x] Cross-`Fs` (`StdFs`↔`MemFs`) link-or-copy streams through both
trait objects (inline in `src/checkpoint.rs`)

Plus unit coverage for:
- `DeletionPause` — defer / inactive-passthrough / nested-pauses /
generation-race-under-lock
- `StdFs::is_cross_device` — raw EXDEV + `ErrorKind` variants
- `Fs::hard_link` — round-trip + duplicate / missing-source rejection
(both `StdFs` and `MemFs`)
- `copy_fallback` — independence + create-new semantics

- [x] `cargo clippy --lib --tests` clean
- [x] `cargo test --doc` passes for all touched modules

Closes #210


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Point-in-time checkpoint snapshots (optional blob inclusion) reporting
file counts, total bytes, version id and visibility seqno; checkpoints
can be opened as independent trees.

* **Reliability / Bug Fixes**
* Atomic, durable checkpoint creation with hard-link-first and
streamed-copy fallback (debug-logged); best-effort cleanup on failure;
opening rejects half-written or corrupt checkpoints with clearer
diagnostics.

* **Utilities**
* Checkpoint-aware gate to defer physical file removals during snapshot
lifetimes.

* **Filesystem compatibility**
* Improved create-dir/hard-link handling with cross-filesystem copy
fallback and namespace signaling.

* **Tests / Documentation**
* Extensive end-to-end, isolation and crash-safety checkpoint suite;
README updated to document PITR checkpoints.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/structured-world/coordinode-lsm-tree/pull/276?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---

BREAKING CHANGE: on-disk format V5. V3/V4 databases are NOT readable by
this version and vice versa. V5 ships the BuRR filter wire-format change
(replaces standard bloom, #269) plus the PITR hard-link checkpoint
primitive (this PR, #210). The major bump goes through release-plz on
squash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(filter): replace bloom with BuRR (Bumped Ribbon Retrieval) — math-optimal filter

2 participants