Skip to content

feat(compression): zstd dictionary compression support#131

Merged
polaz merged 18 commits into
mainfrom
feat/#129-featcompression-zstd-dictionary-compression-suppor
Mar 23, 2026
Merged

feat(compression): zstd dictionary compression support#131
polaz merged 18 commits into
mainfrom
feat/#129-featcompression-zstd-dictionary-compression-suppor

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 23, 2026

Summary

  • Add CompressionType::ZstdDict { level, dict_id } variant for zstd dictionary-based block compression
  • Add ZstdDictionary struct (raw bytes + xxh3-based dict_id fingerprint)
  • Thread dictionary through Config → flush/compaction/ingestion/recovery → Block write/read
  • Add Error::ZstdDictMismatch { expected: u32, got: Option<u32> } for dict_id validation

Technical Details

  • On-disk format: tag 4 (1B tag + 1B level + 4B dict_id = 6 bytes), backward compatible — old readers get InvalidTag
  • Dictionary parameter uses #[cfg(feature = "zstd")] gating to avoid any overhead when the feature is disabled
  • Compression uses zstd::bulk::Compressor::with_dictionary(), decompression uses zstd::bulk::Decompressor::with_dictionary()
  • Config::open() validation (fail-fast):
    • All ZstdDict entries in data block compression policies must match the provided dictionary's dict_id
    • KvSeparationOptions::compression set to ZstdDict is rejected (ErrorKind::Unsupported)
  • Table::recover() validates the persisted data_block_compression dict_id against the provided dictionary
  • Writer::use_index_block_compression() silently downgrades ZstdDict to plain Zstd — dictionaries are trained on data block content, not index/filter structures
  • Blob files return ErrorKind::Unsupported for ZstdDict at both config and runtime levels

Known Limitations

  • Blob file (KV-separated large values) dictionary compression not yet supported
  • No built-in dictionary training API — users provide pre-trained dictionaries
  • Compressor/decompressor contexts created per-call (pre-built context caching is future optimization)

Test Plan

  • Unit tests: serialization roundtrip, level validation, dict_id computation, mismatch detection
  • Block-level roundtrip: from_reader, from_file, large data, encrypted+dict (both branches)
  • Block error paths: missing dict, wrong dict, write-side missing dict
  • Integration: full tree write→flush→read, range scan with value verification, per-level policy (ZstdDict at L0)
  • Validation: config open with mismatch, config open with missing dict, reopen with wrong dict fails at recovery
  • Blob writer: ZstdDict returns ErrorKind::Unsupported
  • Full test suite passes with --all-features (800+ tests, 0 failures)
  • Compiles clean with --no-default-features, --features lz4, --features zstd, --all-features

Closes #129

Copilot AI review requested due to automatic review settings March 23, 2026 15:15
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 9439847 Previous: e99ede9 Ratio
overwrite 938669.3796464545 ops/sec 1233004.5504094583 ops/sec 1.31

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds zstd dictionary-based compression as a first-class CompressionType and threads an optional ZstdDictionary through configuration and (most) block write/read paths so SST data blocks can be compressed/decompressed with a caller-supplied dictionary, with explicit mismatch detection.

Changes:

  • Introduces CompressionType::ZstdDict { level, dict_id } plus ZstdDictionary (raw bytes + xxh3-derived fingerprint) and serialization support.
  • Threads an optional dictionary through config/tree/table writer+reader plumbing and enforces dict_id matching via Error::ZstdDictMismatch.
  • Adds integration/unit tests and README documentation; explicitly rejects ZstdDict for blob files (vlog) for now.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/zstd_dict_roundtrip.rs New integration tests for tree roundtrip + mismatch behavior under zstd dict compression.
src/vlog/blob_file/writer.rs Rejects CompressionType::ZstdDict for blob-file writes.
src/vlog/blob_file/reader.rs Rejects CompressionType::ZstdDict for blob-file reads.
src/vlog/blob_file/meta.rs Updates block read/write calls for new (cfg’d) zstd dict argument.
src/tree/mod.rs Threads configured dictionary into table writer/table open paths (cfg zstd).
src/tree/ingest.rs Threads configured dictionary into ingestion writer/table open paths (cfg zstd).
src/table/writer/mod.rs Stores optional zstd dictionary in table writer and passes it to data-block writes.
src/table/writer/index/partitioned.rs Updates index block writes with new zstd dict argument (currently None).
src/table/writer/index/full.rs Updates index block writes with new zstd dict argument (currently None).
src/table/writer/filter/partitioned.rs Updates filter/TLI block writes with new zstd dict argument (currently None).
src/table/writer/filter/full.rs Updates filter block writes with new zstd dict argument.
src/table/util.rs Extends load_block API to accept optional zstd dictionary for block decoding (cfg zstd).
src/table/tests.rs Updates table tests to pass the new (cfg’d) zstd dict parameter.
src/table/scanner.rs Updates sequential table scanner to call Block::from_reader with new argument (currently None).
src/table/multi_writer.rs Threads optional zstd dictionary through multi-writer rotation (cfg zstd).
src/table/mod.rs Threads dictionary into table reads/iterators and table construction (cfg zstd).
src/table/meta.rs Updates meta block reads with new zstd dict argument.
src/table/iter.rs Stores optional zstd dictionary in table iterator and passes it to data-block loads (cfg zstd).
src/table/inner.rs Adds optional zstd dictionary field to table inner state (cfg zstd).
src/table/block_index/volatile.rs Updates index-block loads with new zstd dict argument.
src/table/block_index/two_level.rs Updates index-block loads with new zstd dict argument.
src/table/block/mod.rs Implements zstd dict compress/decompress in block read/write and enforces dict_id matching.
src/lib.rs Re-exports ZstdDictionary behind the zstd feature.
src/error.rs Adds Error::ZstdDictMismatch { expected, got }.
src/config/mod.rs Adds optional zstd_dictionary to config and a builder method (cfg zstd).
src/compression.rs Adds ZstdDictionary, CompressionType::ZstdDict, and encoding/decoding + tests.
src/compaction/flavour.rs Threads configured dictionary into compaction table-writer setup and table open path (cfg zstd).
src/blob_tree/mod.rs Threads configured dictionary into blob-tree index table writer/open paths (cfg zstd).
src/blob_tree/ingest.rs Threads configured dictionary into blob-tree ingestion table open path (cfg zstd).
README.md Documents the zstd feature and mentions ZstdDict support.

Comment thread src/table/writer/index/partitioned.rs
Comment thread src/table/writer/filter/partitioned.rs
Comment thread src/table/scanner.rs
Comment thread src/table/writer/index/full.rs
Comment thread tests/zstd_dict_roundtrip.rs Outdated
@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 12bde97 to e3b3b1b Compare March 23, 2026 15:54
@polaz polaz requested a review from Copilot March 23, 2026 16:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

Comment thread src/error.rs Outdated
Comment thread src/vlog/blob_file/writer.rs Outdated
Comment thread src/vlog/blob_file/reader.rs
Comment thread tests/zstd_dict_roundtrip.rs Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds feature-gated Zstd dictionary support: new ZstdDictionary type and CompressionType::ZstdDict, plumbing to thread an optional dictionary through writers, MultiWriter, Table recovery/load, scanners/iterators, and block IO, config validation and error variant, blob-file rejection for ZstdDict, tests, and README docs.

Changes

Cohort / File(s) Summary
Compression core
src/compression.rs
Add ZstdDictionary (Arc-backed, id computation), new CompressionType::ZstdDict variant, encode/decode/Display, and unit tests.
Config, errors, crate export
src/config/mod.rs, src/error.rs, src/lib.rs
Add Config::zstd_dictionary(...), open-time validation validate_zstd_dictionary, new Error::ZstdDictMismatch, and cfg-gated re-export ZstdDictionary.
Writer & MultiWriter plumbing
src/table/writer/mod.rs, src/table/multi_writer.rs, src/compaction/flavour.rs, src/blob_tree/mod.rs, src/tree/*.rs, src/tree/ingest.rs, src/blob_tree/ingest.rs
Thread optional zstd_dictionary into Writer/MultiWriter/prepare_table_writer/ingestion/flush paths; add builder APIs to set/clear dictionary and apply it to active writers; pass dictionary into Table::recover under cfg(zstd).
Table, Inner, scanning & iterators
src/table/mod.rs, src/table/inner.rs, src/table/iter.rs, src/table/scanner.rs, src/table/util.rs
Add zstd_dictionary storage on Inner/Table, extend Table::recover and Table::load_block to accept/forward optional dict, pass dict into Scanner/Iter, and forward into load_block/block loading.
Block IO & index/filter/meta blocks
src/table/block/mod.rs, src/table/block_index/..., src/table/writer/**, src/table/writer/filter/*, src/table/writer/index/*, src/table/writer/mod.rs, src/table/writer/index/..., src/table/writer/filter/...
Extend Block::write_into / Block::from_reader / Block::from_file signatures (cfg zstd) to accept zstd_dict; implement ZstdDict compress/decompress with dict validation and errors; explicitly disable dict for index/filter/meta blocks by passing None.
Table recovery/tests
src/table/tests.rs, src/table/tests.rs
Update test callsites to include cfg-gated None zstd arg in Table::recover and load_block calls.
Vlog / blob files
src/vlog/blob_file/writer.rs, src/vlog/blob_file/reader.rs, src/vlog/blob_file/meta.rs
Mark CompressionType::ZstdDict unsupported for blob files: writer/reader return Unsupported io error; metadata read/write adjusted to pass conditional dict arg.
High-level ingestion/flush
src/blob_tree/ingest.rs, src/tree/ingest.rs, src/tree/mod.rs
Apply configured dictionary to MultiWriter creation and pass dictionary into Table::recover during ingestion/flush/recovery when cfg(zstd).
Integration tests & docs
tests/zstd_dict_roundtrip.rs, README.md
Add new zstd-only integration tests covering roundtrips, mismatches, and recovery; document zstd feature and mark it disabled by default in README.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Config
    participant MultiWriter
    participant Table
    participant BlockIO as Block::write_into/from_file
    participant Storage

    rect rgba(200, 220, 255, 0.5)
    Client->>Config: open(config with zstd_dictionary)
    Config->>Config: validate_zstd_dictionary()
    end

    rect rgba(200, 255, 200, 0.5)
    Client->>MultiWriter: create and use_zstd_dictionary(dictionary)
    MultiWriter->>MultiWriter: store and apply zstd_dictionary to new Writer(s)
    end

    rect rgba(255, 230, 200, 0.5)
    Client->>Table: flush/write data blocks
    Table->>BlockIO: Block::write_into(data, CompressionType::ZstdDict, zstd_dict)
    BlockIO->>Storage: write compressed block + metadata(dict_id)
    end

    rect rgba(255, 200, 200, 0.5)
    Client->>Table: reopen / recover
    Table->>Table: Table::recover(zstd_dictionary)
    Table->>BlockIO: Block::from_file(..., zstd_dict)
    BlockIO->>Table: validate dict_id == zstd_dict.id() or return ZstdDictMismatch
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(compression): zstd dictionary compression support' accurately and concisely summarizes the main change—adding zstd dictionary-based compression support as a new feature.
Description check ✅ Passed The PR description is well-detailed and directly related to the changeset, explaining the summary, technical details, known limitations, and test plan for the zstd dictionary compression feature.
Linked Issues check ✅ Passed The PR fully addresses the objectives from issue #129: introduces CompressionType::ZstdDict with dict_id, implements ZstdDictionary type, threads dictionary through Config/IO paths, adds validation, maintains backward compatibility, and defers training API.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing zstd dictionary compression support as specified in issue #129. No unrelated or extraneous changes were introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#129-featcompression-zstd-dictionary-compression-suppor

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 90.76923% with 42 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/table/block/mod.rs 86.17% 30 Missing ⚠️
src/config/mod.rs 89.47% 4 Missing ⚠️
src/vlog/blob_file/reader.rs 0.00% 4 Missing ⚠️
src/table/mod.rs 96.87% 1 Missing ⚠️
src/table/scanner.rs 94.44% 1 Missing ⚠️
src/table/util.rs 87.50% 1 Missing ⚠️
src/table/writer/mod.rs 93.75% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 5ba6d1f to f4f1b7c Compare March 23, 2026 17:33
@polaz polaz requested a review from Copilot March 23, 2026 17:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment thread tests/zstd_dict_roundtrip.rs Outdated
Comment thread tests/zstd_dict_roundtrip.rs
@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 9439847 to df775a1 Compare March 23, 2026 18:11
@polaz polaz requested a review from Copilot March 23, 2026 18:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 69-72: Update the README wording around Zstd to clarify that
CompressionType::ZstdDict (dictionary compression) currently only applies to
table blocks and is not supported for blob files; change the sentences that
imply universal support to explicitly state "dictionary compression
(CompressionType::ZstdDict) improves ratios for table blocks (4–64 KiB) but
blob-file dictionary compression is not yet supported." Also add a short note or
TODO calling out future support for blob-file dictionary compression so readers
know it's a known limitation.

In `@src/config/mod.rs`:
- Around line 283-291: Config currently holds a single zstd_dictionary
(Config::zstd_dictionary) but CompressionPolicy and CompressionType::ZstdDict {
dict_id, .. } can request different dict_id values (used in tree flushing and
compaction paths in src/tree/mod.rs and src/compaction/flavour.rs), which will
cause Error::ZstdDictMismatch when a different dict_id is selected; fix by
either (A) validating at Config initialization that every
CompressionPolicy/level and both data/index policies only reference the same
dict_id and fail early if multiple ids are configured, or (B) replace
Config::zstd_dictionary with a registry (e.g., HashMap<dict_id,
Arc<ZstdDictionary>>) and update call sites in tree::flush/recover and
compaction::flavour to look up the dictionary by dict_id before cloning/using
it.

In `@src/table/mod.rs`:
- Around line 453-460: read_tli() currently hardcodes None when calling
Block::from_file so the zstd_dictionary argument added later never reaches the
index/TLI block reader, allowing CompressionType::ZstdDict to be selected via
Writer::use_index_block_compression() but not decoded on read; fix by either
plumbing the zstd_dictionary through read_tli() into the Block::from_file call
(propagate the zstd_dictionary parameter to read_tli and pass it to
Block::from_file) or enforce rejection up front (modify
Writer::use_index_block_compression() to disallow CompressionType::ZstdDict for
index/filter TLI blocks and return an error), and add corresponding
validation/tests so the chosen approach prevents decode failures at recovery
time.

In `@src/table/writer/filter/partitioned.rs`:
- Around line 140-142: The current filter TLI write passes None for the zstd
dictionary regardless of self.compression which will cause a runtime
ZstdDictMismatch when CompressionType::ZstdDict is selected; update the call
site in partitioned.rs (the filter TLI write path) to explicitly handle
self.compression: match on CompressionType::ZstdDict and either supply the
proper dictionary (if available) or normalize the compression to the
non-dictionary variant (e.g., map ZstdDict -> Zstd) before calling the writer,
and keep passing None for other compression variants so filter blocks never
silently mismatch.

In `@src/table/writer/index/partitioned.rs`:
- Around line 70-72: The partitioned index write paths currently pass a
hardcoded None dictionary to the index writer even when self.compression may be
CompressionType::ZstdDict, which can cause ZstdDictMismatch; update both call
sites in partitioned.rs (the index block write calls that pass None for the dict
at the spots referenced around lines ~70 and ~135) to inspect self.compression
and only pass None when not CompressionType::ZstdDict—when it is ZstdDict,
obtain and pass the appropriate dictionary (or propagate an error if the dict is
unavailable) so the writer receives the matching Zstd dictionary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f49c494d-ee29-4032-bc30-23f6aeb35f15

📥 Commits

Reviewing files that changed from the base of the PR and between e99ede9 and 9439847.

📒 Files selected for processing (30)
  • README.md
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/compression.rs
  • src/config/mod.rs
  • src/error.rs
  • src/lib.rs
  • src/table/block/mod.rs
  • src/table/block_index/two_level.rs
  • src/table/block_index/volatile.rs
  • src/table/inner.rs
  • src/table/iter.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/scanner.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs
  • src/table/writer/index/full.rs
  • src/table/writer/index/partitioned.rs
  • src/table/writer/mod.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/writer.rs
  • tests/zstd_dict_roundtrip.rs

Comment thread README.md
Comment thread src/config/mod.rs
Comment thread src/table/mod.rs
Comment thread src/table/writer/filter/partitioned.rs
Comment thread src/table/writer/index/partitioned.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment thread tests/zstd_dict_roundtrip.rs Outdated
@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch 2 times, most recently from 0ff14dd to 0426b3b Compare March 23, 2026 18:37
@polaz polaz requested a review from Copilot March 23, 2026 18:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment thread src/config/mod.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/config/mod.rs (1)

414-439: ⚠️ Potential issue | 🟡 Minor

Normalize index policies before this dictionary check.

Writer::use_index_block_compression() downgrades CompressionType::ZstdDict { .. } to plain Zstd, but this loop still treats index_block_compression_policy as if it required a live dictionary. That makes open() reject configurations that can never write dict-compressed index/TLI blocks, including mixed data/index dict IDs that the writer would otherwise handle by downgrading the index side.

Based on learnings, Writer::use_index_block_compression() intentionally downgrades CompressionType::ZstdDict { .. } to CompressionType::Zstd(level) because index/filter/TLI blocks never carry a dictionary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/mod.rs` around lines 414 - 439, The loop in open() checks
index_block_compression_policy for CompressionType::ZstdDict and rejects
mismatched dict IDs, but Writer::use_index_block_compression() intentionally
downgrades index-side ZstdDict to CompressionType::Zstd, so you must normalize
index policies before this check; update the logic that iterates over
self.data_block_compression_policy and self.index_block_compression_policy to
first map or filter index_block_compression_policy through the same downgrading
used by Writer::use_index_block_compression() (or explicitly treat index entries
as Zstd without dict) so only true data-side ZstdDict entries are validated for
dict_id mismatches and index-side entries are ignored for dict requirements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/config/mod.rs`:
- Around line 414-439: The loop in open() checks index_block_compression_policy
for CompressionType::ZstdDict and rejects mismatched dict IDs, but
Writer::use_index_block_compression() intentionally downgrades index-side
ZstdDict to CompressionType::Zstd, so you must normalize index policies before
this check; update the logic that iterates over
self.data_block_compression_policy and self.index_block_compression_policy to
first map or filter index_block_compression_policy through the same downgrading
used by Writer::use_index_block_compression() (or explicitly treat index entries
as Zstd without dict) so only true data-side ZstdDict entries are validated for
dict_id mismatches and index-side entries are ignored for dict requirements.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a7c63f7-8aa0-4227-b2a6-d25802e5255e

📥 Commits

Reviewing files that changed from the base of the PR and between 9439847 and 0426b3b.

📒 Files selected for processing (30)
  • README.md
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/compression.rs
  • src/config/mod.rs
  • src/error.rs
  • src/lib.rs
  • src/table/block/mod.rs
  • src/table/block_index/two_level.rs
  • src/table/block_index/volatile.rs
  • src/table/inner.rs
  • src/table/iter.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/scanner.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs
  • src/table/writer/index/full.rs
  • src/table/writer/index/partitioned.rs
  • src/table/writer/mod.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/writer.rs
  • tests/zstd_dict_roundtrip.rs

Comment thread src/table/mod.rs
Comment thread tests/zstd_dict_roundtrip.rs
@polaz polaz requested a review from Copilot March 23, 2026 19:03
@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from ffdc4d2 to 57afad7 Compare March 23, 2026 19:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment thread tests/zstd_dict_roundtrip.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/table/mod.rs (1)

593-600: ⚠️ Potential issue | 🟠 Major

Validate the filter TLI block type before constructing IndexBlock.

This branch bypasses read_tli() and never checks block.header.block_type == BlockType::Index. A corrupted filter_tli handle can therefore be interpreted as a partition index and feed bogus handles into the bloom path, which risks false negatives and skipped reads.

🛠 Suggested fix
         let block = Block::from_file(
             file_handle.as_ref(),
             filter_tli_handle,
             metadata.index_block_compression,
             encryption.as_deref(),
             #[cfg(feature = "zstd")]
             None,
         )?;
+        if block.header.block_type != BlockType::Index {
+            return Err(crate::Error::InvalidTag((
+                "BlockType",
+                block.header.block_type.into(),
+            )));
+        }
         Some(IndexBlock::new(block))

As per coding guidelines, "Flag missing validation: unchecked block offset, unvalidated segment metadata from disk" and "When adding validation for on-disk data, add a test that tampers the relevant field and asserts the error."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/table/mod.rs` around lines 593 - 600, The code constructs an IndexBlock
from a filter_tli without validating the on-disk block type; call the existing
read_tli() helper or, after Block::from_file(...) and before constructing
IndexBlock, assert that block.header.block_type == BlockType::Index and return
an error if not (use the same error type/path as read_tli), to avoid
interpreting corrupted filter_tli as a partition index; also add a
unit/integration test that tampers the filter_tli block_type on disk and
verifies the code returns the validation error rather than proceeding into the
bloom path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/table/mod.rs`:
- Around line 593-600: The code constructs an IndexBlock from a filter_tli
without validating the on-disk block type; call the existing read_tli() helper
or, after Block::from_file(...) and before constructing IndexBlock, assert that
block.header.block_type == BlockType::Index and return an error if not (use the
same error type/path as read_tli), to avoid interpreting corrupted filter_tli as
a partition index; also add a unit/integration test that tampers the filter_tli
block_type on disk and verifies the code returns the validation error rather
than proceeding into the bloom path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c5fa0570-8cea-4fbe-949c-0ddced09ebc7

📥 Commits

Reviewing files that changed from the base of the PR and between 0426b3b and 57afad7.

📒 Files selected for processing (30)
  • README.md
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/compression.rs
  • src/config/mod.rs
  • src/error.rs
  • src/lib.rs
  • src/table/block/mod.rs
  • src/table/block_index/two_level.rs
  • src/table/block_index/volatile.rs
  • src/table/inner.rs
  • src/table/iter.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/scanner.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/table/writer/filter/full.rs
  • src/table/writer/filter/partitioned.rs
  • src/table/writer/index/full.rs
  • src/table/writer/index/partitioned.rs
  • src/table/writer/mod.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/writer.rs
  • tests/zstd_dict_roundtrip.rs

@polaz polaz force-pushed the feat/#129-featcompression-zstd-dictionary-compression-suppor branch from 34411a5 to b7d5b96 Compare March 23, 2026 22:52
@polaz polaz requested a review from Copilot March 23, 2026 22:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

@polaz polaz merged commit ef6f4b3 into main Mar 23, 2026
19 checks passed
@polaz polaz deleted the feat/#129-featcompression-zstd-dictionary-compression-suppor branch March 23, 2026 23:24
polaz pushed a commit that referenced this pull request Mar 24, 2026
## 🤖 New release

* `coordinode-lsm-tree`: 4.0.0 -> 4.1.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[4.1.0](v4.0.0...v4.1.0)
- 2026-03-24

### Added

- *(fs)* io_uring Fs implementation for high-throughput I/O
([#106](#106))
- *(compression)* zstd dictionary compression support
([#131](#131))

### Documentation

- add benchmark dashboard link and update badges
([#151](#151))
- add v4.0.0 fork epoch changelog (all changes since upstream v3.1.1)

### Fixed

- *(version)* fsync version file before rewriting CURRENT pointer
([#152](#152))
- thread UserComparator through ingestion guards and range overlap
([#139](#139))

### Performance

- *(bench)* add multi-threaded support to all db_bench workloads
([#155](#155))
- *(merge)* replace IntervalHeap with sorted-vec heap +
replace_min/replace_max
([#148](#148))
- *(compaction)* merge input ranges before L2 overlap query
([#146](#146))

### Refactored

- *(version)* comparator API cleanup — TransformContext + rename
Run::push()
([#153](#153))
- add #[non_exhaustive] to CompressionType enum
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(compression): zstd dictionary compression support

2 participants