feat(compression): zstd dictionary compression support by polaz · Pull Request #131 · structured-world/coordinode-lsm-tree

polaz · 2026-03-23T15:15:41Z

Summary

Add CompressionType::ZstdDict { level, dict_id } variant for zstd dictionary-based block compression
Add ZstdDictionary struct (raw bytes + xxh3-based dict_id fingerprint)
Thread dictionary through Config → flush/compaction/ingestion/recovery → Block write/read
Add Error::ZstdDictMismatch { expected: u32, got: Option<u32> } for dict_id validation

Technical Details

On-disk format: tag 4 (1B tag + 1B level + 4B dict_id = 6 bytes), backward compatible — old readers get InvalidTag
Dictionary parameter uses #[cfg(feature = "zstd")] gating to avoid any overhead when the feature is disabled
Compression uses zstd::bulk::Compressor::with_dictionary(), decompression uses zstd::bulk::Decompressor::with_dictionary()
Config::open() validation (fail-fast):
- All ZstdDict entries in data block compression policies must match the provided dictionary's dict_id
- KvSeparationOptions::compression set to ZstdDict is rejected (ErrorKind::Unsupported)
Table::recover() validates the persisted data_block_compression dict_id against the provided dictionary
Writer::use_index_block_compression() silently downgrades ZstdDict to plain Zstd — dictionaries are trained on data block content, not index/filter structures
Blob files return ErrorKind::Unsupported for ZstdDict at both config and runtime levels

Known Limitations

Blob file (KV-separated large values) dictionary compression not yet supported
No built-in dictionary training API — users provide pre-trained dictionaries
Compressor/decompressor contexts created per-call (pre-built context caching is future optimization)

Test Plan

Unit tests: serialization roundtrip, level validation, dict_id computation, mismatch detection
Block-level roundtrip: from_reader, from_file, large data, encrypted+dict (both branches)
Block error paths: missing dict, wrong dict, write-side missing dict
Integration: full tree write→flush→read, range scan with value verification, per-level policy (ZstdDict at L0)
Validation: config open with mismatch, config open with missing dict, reopen with wrong dict fails at recovery
Blob writer: ZstdDict returns ErrorKind::Unsupported
Full test suite passes with --all-features (800+ tests, 0 failures)
Compiles clean with --no-default-features, --features lz4, --features zstd, --all-features

Closes #129

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `9439847`	Previous: `e99ede9`	Ratio
`overwrite`	`938669.3796464545` ops/sec	`1233004.5504094583` ops/sec	`1.31`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copilot

Pull request overview

Adds zstd dictionary-based compression as a first-class CompressionType and threads an optional ZstdDictionary through configuration and (most) block write/read paths so SST data blocks can be compressed/decompressed with a caller-supplied dictionary, with explicit mismatch detection.

Changes:

Introduces CompressionType::ZstdDict { level, dict_id } plus ZstdDictionary (raw bytes + xxh3-derived fingerprint) and serialization support.
Threads an optional dictionary through config/tree/table writer+reader plumbing and enforces dict_id matching via Error::ZstdDictMismatch.
Adds integration/unit tests and README documentation; explicitly rejects ZstdDict for blob files (vlog) for now.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/zstd_dict_roundtrip.rs	New integration tests for tree roundtrip + mismatch behavior under zstd dict compression.
src/vlog/blob_file/writer.rs	Rejects `CompressionType::ZstdDict` for blob-file writes.
src/vlog/blob_file/reader.rs	Rejects `CompressionType::ZstdDict` for blob-file reads.
src/vlog/blob_file/meta.rs	Updates block read/write calls for new (cfg’d) zstd dict argument.
src/tree/mod.rs	Threads configured dictionary into table writer/table open paths (cfg zstd).
src/tree/ingest.rs	Threads configured dictionary into ingestion writer/table open paths (cfg zstd).
src/table/writer/mod.rs	Stores optional zstd dictionary in table writer and passes it to data-block writes.
src/table/writer/index/partitioned.rs	Updates index block writes with new zstd dict argument (currently `None`).
src/table/writer/index/full.rs	Updates index block writes with new zstd dict argument (currently `None`).
src/table/writer/filter/partitioned.rs	Updates filter/TLI block writes with new zstd dict argument (currently `None`).
src/table/writer/filter/full.rs	Updates filter block writes with new zstd dict argument.
src/table/util.rs	Extends `load_block` API to accept optional zstd dictionary for block decoding (cfg zstd).
src/table/tests.rs	Updates table tests to pass the new (cfg’d) zstd dict parameter.
src/table/scanner.rs	Updates sequential table scanner to call `Block::from_reader` with new argument (currently `None`).
src/table/multi_writer.rs	Threads optional zstd dictionary through multi-writer rotation (cfg zstd).
src/table/mod.rs	Threads dictionary into table reads/iterators and table construction (cfg zstd).
src/table/meta.rs	Updates meta block reads with new zstd dict argument.
src/table/iter.rs	Stores optional zstd dictionary in table iterator and passes it to data-block loads (cfg zstd).
src/table/inner.rs	Adds optional zstd dictionary field to table inner state (cfg zstd).
src/table/block_index/volatile.rs	Updates index-block loads with new zstd dict argument.
src/table/block_index/two_level.rs	Updates index-block loads with new zstd dict argument.
src/table/block/mod.rs	Implements zstd dict compress/decompress in block read/write and enforces dict_id matching.
src/lib.rs	Re-exports `ZstdDictionary` behind the `zstd` feature.
src/error.rs	Adds `Error::ZstdDictMismatch { expected, got }`.
src/config/mod.rs	Adds optional `zstd_dictionary` to config and a builder method (cfg zstd).
src/compression.rs	Adds `ZstdDictionary`, `CompressionType::ZstdDict`, and encoding/decoding + tests.
src/compaction/flavour.rs	Threads configured dictionary into compaction table-writer setup and table open path (cfg zstd).
src/blob_tree/mod.rs	Threads configured dictionary into blob-tree index table writer/open paths (cfg zstd).
src/blob_tree/ingest.rs	Threads configured dictionary into blob-tree ingestion table open path (cfg zstd).
README.md	Documents the `zstd` feature and mentions `ZstdDict` support.

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

coderabbitai · 2026-03-23T17:09:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds feature-gated Zstd dictionary support: new ZstdDictionary type and CompressionType::ZstdDict, plumbing to thread an optional dictionary through writers, MultiWriter, Table recovery/load, scanners/iterators, and block IO, config validation and error variant, blob-file rejection for ZstdDict, tests, and README docs.

Changes

Cohort / File(s)	Summary
Compression core `src/compression.rs`	Add `ZstdDictionary` (Arc-backed, id computation), new `CompressionType::ZstdDict` variant, encode/decode/Display, and unit tests.
Config, errors, crate export `src/config/mod.rs`, `src/error.rs`, `src/lib.rs`	Add `Config::zstd_dictionary(...)`, open-time validation `validate_zstd_dictionary`, new `Error::ZstdDictMismatch`, and cfg-gated re-export `ZstdDictionary`.
Writer & MultiWriter plumbing `src/table/writer/mod.rs`, `src/table/multi_writer.rs`, `src/compaction/flavour.rs`, `src/blob_tree/mod.rs`, `src/tree/*.rs`, `src/tree/ingest.rs`, `src/blob_tree/ingest.rs`	Thread optional `zstd_dictionary` into Writer/MultiWriter/prepare_table_writer/ingestion/flush paths; add builder APIs to set/clear dictionary and apply it to active writers; pass dictionary into `Table::recover` under cfg(zstd).
Table, Inner, scanning & iterators `src/table/mod.rs`, `src/table/inner.rs`, `src/table/iter.rs`, `src/table/scanner.rs`, `src/table/util.rs`	Add `zstd_dictionary` storage on `Inner/Table`, extend `Table::recover` and `Table::load_block` to accept/forward optional dict, pass dict into Scanner/Iter, and forward into `load_block`/block loading.
Block IO & index/filter/meta blocks `src/table/block/mod.rs`, `src/table/block_index/...`, `src/table/writer/*`, `src/table/writer/filter/`, `src/table/writer/index/*`, `src/table/writer/mod.rs`, `src/table/writer/index/...`, `src/table/writer/filter/...`	Extend `Block::write_into` / `Block::from_reader` / `Block::from_file` signatures (cfg zstd) to accept `zstd_dict`; implement ZstdDict compress/decompress with dict validation and errors; explicitly disable dict for index/filter/meta blocks by passing `None`.
Table recovery/tests `src/table/tests.rs`, `src/table/tests.rs`	Update test callsites to include cfg-gated `None` zstd arg in `Table::recover` and `load_block` calls.
Vlog / blob files `src/vlog/blob_file/writer.rs`, `src/vlog/blob_file/reader.rs`, `src/vlog/blob_file/meta.rs`	Mark `CompressionType::ZstdDict` unsupported for blob files: writer/reader return Unsupported io error; metadata read/write adjusted to pass conditional dict arg.
High-level ingestion/flush `src/blob_tree/ingest.rs`, `src/tree/ingest.rs`, `src/tree/mod.rs`	Apply configured dictionary to MultiWriter creation and pass dictionary into `Table::recover` during ingestion/flush/recovery when cfg(zstd).
Integration tests & docs `tests/zstd_dict_roundtrip.rs`, `README.md`	Add new zstd-only integration tests covering roundtrips, mismatches, and recovery; document `zstd` feature and mark it disabled by default in README.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Config
    participant MultiWriter
    participant Table
    participant BlockIO as Block::write_into/from_file
    participant Storage

    rect rgba(200, 220, 255, 0.5)
    Client->>Config: open(config with zstd_dictionary)
    Config->>Config: validate_zstd_dictionary()
    end

    rect rgba(200, 255, 200, 0.5)
    Client->>MultiWriter: create and use_zstd_dictionary(dictionary)
    MultiWriter->>MultiWriter: store and apply zstd_dictionary to new Writer(s)
    end

    rect rgba(255, 230, 200, 0.5)
    Client->>Table: flush/write data blocks
    Table->>BlockIO: Block::write_into(data, CompressionType::ZstdDict, zstd_dict)
    BlockIO->>Storage: write compressed block + metadata(dict_id)
    end

    rect rgba(255, 200, 200, 0.5)
    Client->>Table: reopen / recover
    Table->>Table: Table::recover(zstd_dictionary)
    Table->>BlockIO: Block::from_file(..., zstd_dict)
    BlockIO->>Table: validate dict_id == zstd_dict.id() or return ZstdDictMismatch
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

perf(encryption): reduce allocations in encrypt/decrypt block pipeline #105 — Modifies the same Block I/O functions (Block::write_into, Block::from_reader, Block::from_file); likely overlapping changes to block read/write signatures.
feat: custom key comparison / comparator #67 — Threads extra context through table writer/recovery plumbing (similar pattern of adding args to Table::recover / writer construction).
feat: block-level encryption at rest #71 — Adds per-block context plumbing (encryption) through the same writer/Table/Block APIs; likely to intersect where optional context is threaded.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat(compression): zstd dictionary compression support' accurately and concisely summarizes the main change—adding zstd dictionary-based compression support as a new feature.
Description check	✅ Passed	The PR description is well-detailed and directly related to the changeset, explaining the summary, technical details, known limitations, and test plan for the zstd dictionary compression feature.
Linked Issues check	✅ Passed	The PR fully addresses the objectives from issue `#129`: introduces CompressionType::ZstdDict with dict_id, implements ZstdDictionary type, threads dictionary through Config/IO paths, adds validation, maintains backward compatibility, and defers training API.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing zstd dictionary compression support as specified in issue `#129`. No unrelated or extraneous changes were introduced.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#129-featcompression-zstd-dictionary-compression-suppor

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-23T17:12:40Z

Codecov Report

❌ Patch coverage is 90.76923% with 42 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/table/block/mod.rs	86.17%	30 Missing ⚠️
src/config/mod.rs	89.47%	4 Missing ⚠️
src/vlog/blob_file/reader.rs	0.00%	4 Missing ⚠️
src/table/mod.rs	96.87%	1 Missing ⚠️
src/table/scanner.rs	94.44%	1 Missing ⚠️
src/table/util.rs	87.50%	1 Missing ⚠️
src/table/writer/mod.rs	93.75%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 69-72: Update the README wording around Zstd to clarify that
CompressionType::ZstdDict (dictionary compression) currently only applies to
table blocks and is not supported for blob files; change the sentences that
imply universal support to explicitly state "dictionary compression
(CompressionType::ZstdDict) improves ratios for table blocks (4–64 KiB) but
blob-file dictionary compression is not yet supported." Also add a short note or
TODO calling out future support for blob-file dictionary compression so readers
know it's a known limitation.

In `@src/config/mod.rs`:
- Around line 283-291: Config currently holds a single zstd_dictionary
(Config::zstd_dictionary) but CompressionPolicy and CompressionType::ZstdDict {
dict_id, .. } can request different dict_id values (used in tree flushing and
compaction paths in src/tree/mod.rs and src/compaction/flavour.rs), which will
cause Error::ZstdDictMismatch when a different dict_id is selected; fix by
either (A) validating at Config initialization that every
CompressionPolicy/level and both data/index policies only reference the same
dict_id and fail early if multiple ids are configured, or (B) replace
Config::zstd_dictionary with a registry (e.g., HashMap<dict_id,
Arc<ZstdDictionary>>) and update call sites in tree::flush/recover and
compaction::flavour to look up the dictionary by dict_id before cloning/using
it.

In `@src/table/mod.rs`:
- Around line 453-460: read_tli() currently hardcodes None when calling
Block::from_file so the zstd_dictionary argument added later never reaches the
index/TLI block reader, allowing CompressionType::ZstdDict to be selected via
Writer::use_index_block_compression() but not decoded on read; fix by either
plumbing the zstd_dictionary through read_tli() into the Block::from_file call
(propagate the zstd_dictionary parameter to read_tli and pass it to
Block::from_file) or enforce rejection up front (modify
Writer::use_index_block_compression() to disallow CompressionType::ZstdDict for
index/filter TLI blocks and return an error), and add corresponding
validation/tests so the chosen approach prevents decode failures at recovery
time.

In `@src/table/writer/filter/partitioned.rs`:
- Around line 140-142: The current filter TLI write passes None for the zstd
dictionary regardless of self.compression which will cause a runtime
ZstdDictMismatch when CompressionType::ZstdDict is selected; update the call
site in partitioned.rs (the filter TLI write path) to explicitly handle
self.compression: match on CompressionType::ZstdDict and either supply the
proper dictionary (if available) or normalize the compression to the
non-dictionary variant (e.g., map ZstdDict -> Zstd) before calling the writer,
and keep passing None for other compression variants so filter blocks never
silently mismatch.

In `@src/table/writer/index/partitioned.rs`:
- Around line 70-72: The partitioned index write paths currently pass a
hardcoded None dictionary to the index writer even when self.compression may be
CompressionType::ZstdDict, which can cause ZstdDictMismatch; update both call
sites in partitioned.rs (the index block write calls that pass None for the dict
at the spots referenced around lines ~70 and ~135) to inspect self.compression
and only pass None when not CompressionType::ZstdDict—when it is ZstdDict,
obtain and pass the appropriate dictionary (or propagate an error if the dict is
unavailable) so the writer receives the matching Zstd dictionary.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f49c494d-ee29-4032-bc30-23f6aeb35f15

📥 Commits

Reviewing files that changed from the base of the PR and between e99ede9 and 9439847.

📒 Files selected for processing (30)

README.md
src/blob_tree/ingest.rs
src/blob_tree/mod.rs
src/compaction/flavour.rs
src/compression.rs
src/config/mod.rs
src/error.rs
src/lib.rs
src/table/block/mod.rs
src/table/block_index/two_level.rs
src/table/block_index/volatile.rs
src/table/inner.rs
src/table/iter.rs
src/table/meta.rs
src/table/mod.rs
src/table/multi_writer.rs
src/table/scanner.rs
src/table/tests.rs
src/table/util.rs
src/table/writer/filter/full.rs
src/table/writer/filter/partitioned.rs
src/table/writer/index/full.rs
src/table/writer/index/partitioned.rs
src/table/writer/mod.rs
src/tree/ingest.rs
src/tree/mod.rs
src/vlog/blob_file/meta.rs
src/vlog/blob_file/reader.rs
src/vlog/blob_file/writer.rs
tests/zstd_dict_roundtrip.rs

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/config/mod.rs (1)
414-439: ⚠️ Potential issue | 🟡 Minor

Normalize index policies before this dictionary check.

Writer::use_index_block_compression() downgrades CompressionType::ZstdDict { .. } to plain Zstd, but this loop still treats index_block_compression_policy as if it required a live dictionary. That makes open() reject configurations that can never write dict-compressed index/TLI blocks, including mixed data/index dict IDs that the writer would otherwise handle by downgrading the index side.

Based on learnings, Writer::use_index_block_compression() intentionally downgrades CompressionType::ZstdDict { .. } to CompressionType::Zstd(level) because index/filter/TLI blocks never carry a dictionary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/mod.rs` around lines 414 - 439, The loop in open() checks
index_block_compression_policy for CompressionType::ZstdDict and rejects
mismatched dict IDs, but Writer::use_index_block_compression() intentionally
downgrades index-side ZstdDict to CompressionType::Zstd, so you must normalize
index policies before this check; update the logic that iterates over
self.data_block_compression_policy and self.index_block_compression_policy to
first map or filter index_block_compression_policy through the same downgrading
used by Writer::use_index_block_compression() (or explicitly treat index entries
as Zstd without dict) so only true data-side ZstdDict entries are validated for
dict_id mismatches and index-side entries are ignored for dict requirements.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/config/mod.rs`:
- Around line 414-439: The loop in open() checks index_block_compression_policy
for CompressionType::ZstdDict and rejects mismatched dict IDs, but
Writer::use_index_block_compression() intentionally downgrades index-side
ZstdDict to CompressionType::Zstd, so you must normalize index policies before
this check; update the logic that iterates over
self.data_block_compression_policy and self.index_block_compression_policy to
first map or filter index_block_compression_policy through the same downgrading
used by Writer::use_index_block_compression() (or explicitly treat index entries
as Zstd without dict) so only true data-side ZstdDict entries are validated for
dict_id mismatches and index-side entries are ignored for dict requirements.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a7c63f7-8aa0-4227-b2a6-d25802e5255e

📥 Commits

Reviewing files that changed from the base of the PR and between 9439847 and 0426b3b.

📒 Files selected for processing (30)

README.md
src/blob_tree/ingest.rs
src/blob_tree/mod.rs
src/compaction/flavour.rs
src/compression.rs
src/config/mod.rs
src/error.rs
src/lib.rs
src/table/block/mod.rs
src/table/block_index/two_level.rs
src/table/block_index/volatile.rs
src/table/inner.rs
src/table/iter.rs
src/table/meta.rs
src/table/mod.rs
src/table/multi_writer.rs
src/table/scanner.rs
src/table/tests.rs
src/table/util.rs
src/table/writer/filter/full.rs
src/table/writer/filter/partitioned.rs
src/table/writer/index/full.rs
src/table/writer/index/partitioned.rs
src/table/writer/mod.rs
src/tree/ingest.rs
src/tree/mod.rs
src/vlog/blob_file/meta.rs
src/vlog/blob_file/reader.rs
src/vlog/blob_file/writer.rs
tests/zstd_dict_roundtrip.rs

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/table/mod.rs (1)
593-600: ⚠️ Potential issue | 🟠 Major

Validate the filter TLI block type before constructing IndexBlock.

This branch bypasses read_tli() and never checks block.header.block_type == BlockType::Index. A corrupted filter_tli handle can therefore be interpreted as a partition index and feed bogus handles into the bloom path, which risks false negatives and skipped reads.
🛠 Suggested fix
         let block = Block::from_file(
             file_handle.as_ref(),
             filter_tli_handle,
             metadata.index_block_compression,
             encryption.as_deref(),
             #[cfg(feature = "zstd")]
             None,
         )?;
+        if block.header.block_type != BlockType::Index {
+            return Err(crate::Error::InvalidTag((
+                "BlockType",
+                block.header.block_type.into(),
+            )));
+        }
         Some(IndexBlock::new(block))
As per coding guidelines, "Flag missing validation: unchecked block offset, unvalidated segment metadata from disk" and "When adding validation for on-disk data, add a test that tampers the relevant field and asserts the error."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/table/mod.rs` around lines 593 - 600, The code constructs an IndexBlock
from a filter_tli without validating the on-disk block type; call the existing
read_tli() helper or, after Block::from_file(...) and before constructing
IndexBlock, assert that block.header.block_type == BlockType::Index and return
an error if not (use the same error type/path as read_tli), to avoid
interpreting corrupted filter_tli as a partition index; also add a
unit/integration test that tampers the filter_tli block_type on disk and
verifies the code returns the validation error rather than proceeding into the
bloom path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/table/mod.rs`:
- Around line 593-600: The code constructs an IndexBlock from a filter_tli
without validating the on-disk block type; call the existing read_tli() helper
or, after Block::from_file(...) and before constructing IndexBlock, assert that
block.header.block_type == BlockType::Index and return an error if not (use the
same error type/path as read_tli), to avoid interpreting corrupted filter_tli as
a partition index; also add a unit/integration test that tampers the filter_tli
block_type on disk and verifies the code returns the validation error rather
than proceeding into the bloom path.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c5fa0570-8cea-4fbe-949c-0ddced09ebc7

📥 Commits

Reviewing files that changed from the base of the PR and between 0426b3b and 57afad7.

📒 Files selected for processing (30)

README.md
src/blob_tree/ingest.rs
src/blob_tree/mod.rs
src/compaction/flavour.rs
src/compression.rs
src/config/mod.rs
src/error.rs
src/lib.rs
src/table/block/mod.rs
src/table/block_index/two_level.rs
src/table/block_index/volatile.rs
src/table/inner.rs
src/table/iter.rs
src/table/meta.rs
src/table/mod.rs
src/table/multi_writer.rs
src/table/scanner.rs
src/table/tests.rs
src/table/util.rs
src/table/writer/filter/full.rs
src/table/writer/filter/partitioned.rs
src/table/writer/index/full.rs
src/table/writer/index/partitioned.rs
src/table/writer/mod.rs
src/tree/ingest.rs
src/tree/mod.rs
src/vlog/blob_file/meta.rs
src/vlog/blob_file/reader.rs
src/vlog/blob_file/writer.rs
tests/zstd_dict_roundtrip.rs

… decode test

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

## 🤖 New release * `coordinode-lsm-tree`: 4.0.0 -> 4.1.0 <details><summary>Changelog</summary> <blockquote> ## [4.1.0](v4.0.0...v4.1.0) - 2026-03-24 ### Added - *(fs)* io_uring Fs implementation for high-throughput I/O ([#106](#106)) - *(compression)* zstd dictionary compression support ([#131](#131)) ### Documentation - add benchmark dashboard link and update badges ([#151](#151)) - add v4.0.0 fork epoch changelog (all changes since upstream v3.1.1) ### Fixed - *(version)* fsync version file before rewriting CURRENT pointer ([#152](#152)) - thread UserComparator through ingestion guards and range overlap ([#139](#139)) ### Performance - *(bench)* add multi-threaded support to all db_bench workloads ([#155](#155)) - *(merge)* replace IntervalHeap with sorted-vec heap + replace_min/replace_max ([#148](#148)) - *(compaction)* merge input ranges before L2 overlap query ([#146](#146)) ### Refactored - *(version)* comparator API cleanup — TransformContext + rename Run::push() ([#153](#153)) - add #[non_exhaustive] to CompressionType enum </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 23, 2026 15:15

Copilot started reviewing on behalf of polaz March 23, 2026 15:16 View session

sw-release-bot Bot reviewed Mar 23, 2026

View reviewed changes

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread src/table/writer/index/partitioned.rs

Comment thread src/table/writer/filter/partitioned.rs

Comment thread src/table/scanner.rs

Comment thread src/table/writer/index/full.rs

Comment thread tests/zstd_dict_roundtrip.rs Outdated

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 12bde97 to e3b3b1b Compare March 23, 2026 15:54

polaz requested a review from Copilot March 23, 2026 16:33

Copilot started reviewing on behalf of polaz March 23, 2026 16:33 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread src/error.rs Outdated

Comment thread src/vlog/blob_file/writer.rs Outdated

Comment thread src/vlog/blob_file/reader.rs

Comment thread tests/zstd_dict_roundtrip.rs Outdated

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 5ba6d1f to f4f1b7c Compare March 23, 2026 17:33

polaz requested a review from Copilot March 23, 2026 17:51

Copilot started reviewing on behalf of polaz March 23, 2026 17:53 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread tests/zstd_dict_roundtrip.rs Outdated

Comment thread tests/zstd_dict_roundtrip.rs

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 9439847 to df775a1 Compare March 23, 2026 18:11

polaz requested a review from Copilot March 23, 2026 18:11

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread README.md

Comment thread src/config/mod.rs

Comment thread src/table/mod.rs

Comment thread src/table/writer/filter/partitioned.rs

Comment thread src/table/writer/index/partitioned.rs

Copilot started reviewing on behalf of polaz March 23, 2026 18:16 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread tests/zstd_dict_roundtrip.rs Outdated

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch 2 times, most recently from 0ff14dd to 0426b3b Compare March 23, 2026 18:37

polaz requested a review from Copilot March 23, 2026 18:37

Copilot started reviewing on behalf of polaz March 23, 2026 18:39 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread src/config/mod.rs Outdated

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread src/table/mod.rs

Comment thread tests/zstd_dict_roundtrip.rs

polaz requested a review from Copilot March 23, 2026 19:03

Copilot started reviewing on behalf of polaz March 23, 2026 19:03 View session

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from ffdc4d2 to 57afad7 Compare March 23, 2026 19:09

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread tests/zstd_dict_roundtrip.rs Outdated

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

polaz added 2 commits March 24, 2026 00:51

refactor(compression): explicit ref pattern in dict validation loop

dc27d0d

test(compression): assert exact wire bytes and use real serializer in…

b7d5b96

… decode test

polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 34411a5 to b7d5b96 Compare March 23, 2026 22:52

polaz requested a review from Copilot March 23, 2026 22:59

Copilot started reviewing on behalf of polaz March 23, 2026 23:01 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

polaz merged commit ef6f4b3 into main Mar 23, 2026
19 checks passed

polaz deleted the feat/#129-featcompression-zstd-dictionary-compression-suppor branch March 23, 2026 23:24

This was referenced Mar 23, 2026

chore: release v5.0.0 #60

Closed

chore: release v5.0.0 #149

Closed

chore: release v4.1.0 #150

Merged

coderabbitai Bot mentioned this pull request Mar 24, 2026

feat: pure Rust zstd via CompressionProvider trait + ruzstd fork #157

Closed

6 tasks

Conversation

polaz commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical Details

Known Limitations

Test Plan

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

codecov Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

polaz commented Mar 23, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

codecov Bot commented Mar 23, 2026 •

edited

Loading