Skip to content

perf(compression): cache pre-compiled Dictionary across block decompress calls#227

Merged
polaz merged 4 commits into
mainfrom
feat/#217-perfcompression-cache-pre-compiled-dictionary-acro
Apr 7, 2026
Merged

perf(compression): cache pre-compiled Dictionary across block decompress calls#227
polaz merged 4 commits into
mainfrom
feat/#217-perfcompression-cache-pre-compiled-dictionary-acro

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 6, 2026

Summary

  • C FFI backend: DecoderDictionary<'static> (wraps ZSTD_DDict) is now cached in ZstdDictionary via Arc<OnceLock<...>> — parsed once per process, shared across all clones of the same dictionary handle, zero re-parsing on subsequent blocks
  • Pure Rust backend: FrameDecoder with dictionary pre-loaded is cached in thread-local storage keyed by dict_id — parsed once per thread, no mutex needed (FrameDecoder is !Send)
  • Correctness fix: latent bug in pure Rust decompress_with_dict — was calling init(data) on a Copy slice (only read the frame header; decode buffer stayed empty, always returning Ok([])); replaced with decode_all_to_vec(&mut input) which fully decodes the frame

Changes

File Change
src/compression/mod.rs Add prepared: Arc<OnceLock<DecoderDictionary<'static>>> to ZstdDictionary; add decoder_dict() accessor; change decompress_with_dict signature to take &ZstdDictionary
src/compression/zstd_ffi.rs Use Decompressor::with_prepared_dictionary(dict.decoder_dict()) — no more per-call ZSTD_createDDict
src/compression/zstd_pure.rs TLS-cached FrameDecoder; fix correctness bug; add unit tests with pre-generated test vectors
src/table/block/mod.rs Update 4 decompress_with_dict call sites to pass &dict instead of dict.raw()
benches/zstd_dict.rs New: warm/cold per-block latency benchmarks

Test Plan

  • cargo clippy --features zstd --all-targets -- -D warnings — clean
  • cargo clippy --features zstd-pure --all-targets -- -D warnings — clean
  • cargo nextest run --features zstd --workspace — 1168/1168 passed
  • cargo nextest run --features zstd-pure --workspace — 1157/1157 passed
  • cargo test --doc --workspace — 41/41 passed
  • cargo build --bench zstd_dict --features zstd — compiles
  • cargo build --bench zstd_dict --features zstd-pure — compiles

Closes #217

Summary by CodeRabbit

  • Tests

    • Added a benchmark to measure decompression performance using zstd dictionaries.
  • Refactor

    • Improved compression API to use dictionary objects and enable internal dictionary caching for better decompression efficiency.
    • Compression module is now hidden from generated public documentation.

…ess calls

- C FFI backend: cache `DecoderDictionary<'static>` (ZSTD_DDict) in
  `ZstdDictionary` via `Arc<OnceLock<...>>` — parsed once per process,
  shared across all clones of the same dictionary handle
- Pure Rust backend: cache `FrameDecoder` with dictionary pre-loaded in
  thread-local storage keyed by dict ID — parsed once per thread
- Fix latent correctness bug in pure Rust `decompress_with_dict`: was
  calling `init(data)` on a Copy slice (reads frame header only, output
  buffer stays empty); replace with `decode_all_to_vec` which takes
  `&mut input` and fully decodes the frame
- Change `CompressionProvider::decompress_with_dict` signature from
  `dict_raw: &[u8]` to `dict: &ZstdDictionary` to give backends access
  to the cached prepared form; update all four call sites in block/mod.rs
- Add `ZstdDictionary::decoder_dict()` — lazily initialises ZSTD_DDict
  via `OnceLock::get_or_init` (C FFI only)
- Add unit tests for pure Rust backend with pre-generated test vectors
  (decompress + idempotent repeated calls exercising TLS cache path)
- Add `benches/zstd_dict.rs` with warm / cold per-block latency benchmarks
- Expose `#[doc(hidden)] pub mod compression` so benchmarks can reach
  `CompressionProvider` and `ZstdBackend` type alias

Closes #217
Copilot AI review requested due to automatic review settings April 6, 2026 21:47
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dadc0493-7596-44bb-b041-055e41a7e755

📥 Commits

Reviewing files that changed from the base of the PR and between e0d4113 and 76fb725.

📒 Files selected for processing (7)
  • Cargo.toml
  • benches/zstd_dict.rs
  • src/compression/mod.rs
  • src/compression/zstd_ffi.rs
  • src/compression/zstd_pure.rs
  • src/lib.rs
  • src/table/block/mod.rs

📝 Walkthrough

Walkthrough

Refactors zstd dictionary handling to pass a ZstdDictionary through the compression API, add a lazily-initialized prepared dictionary cache, update provider implementations and call sites to use the cached prepared dictionary, and add a Criterion benchmark measuring warm vs cold dictionary decompression.

Changes

Cohort / File(s) Summary
Benchmark Infrastructure
Cargo.toml, benches/zstd_dict.rs
Add a new zstd_dict Criterion benchmark measuring warm (cached) and cold (fresh) dictionary decompressions.
Compression Trait & Dictionary Type
src/compression/mod.rs
Change CompressionProvider::decompress_with_dict to accept &ZstdDictionary; expand ZstdDictionary id from u32u64; add feature-gated prepared: Arc<OnceLock<...>>, decoder_dict() accessor, and a manual Clone; adjust Debug formatting.
Zstd Provider Implementations
src/compression/zstd_ffi.rs, src/compression/zstd_pure.rs
Providers now accept &ZstdDictionary and use prepared/cached decoder dictionaries; zstd_pure adds a thread-local FrameDecoder cache keyed by dict id and new error handling for oversized output; tests added for pure implementation.
Block Decompression Call Sites
src/table/block/mod.rs
All CompressionType::ZstdDict call sites updated to pass ZstdDictionary references instead of raw bytes in both encrypted and unencrypted paths.
Module Visibility
src/lib.rs
Mark compression module #[doc(hidden)] while keeping it public.

Sequence Diagram(s)

sequenceDiagram
    participant BlockReader as Block Reader
    participant ZstdDict as ZstdDictionary
    participant Cache as Prepared Dict Cache
    participant Provider as Compression Provider

    BlockReader->>ZstdDict: request decoder_dict()
    ZstdDict->>Cache: get_or_init()
    alt cache miss
        Cache->>Cache: prepare DecoderDictionary
        Cache-->>ZstdDict: return prepared dict (cached)
    else cache hit
        Cache-->>ZstdDict: return prepared dict
    end
    BlockReader->>Provider: decompress_with_dict(data, &ZstdDict)
    Provider->>ZstdDict: decoder_dict() (use cached prepared dict)
    Provider-->>BlockReader: decompressed data
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I nibble at dictionaries prepared and neat,

OnceLock keeps them warm, no repeat,
Blocks unwind their zstd thread,
Fast hops forward — no more dread,
A cached crunch, my carrot treat!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main change: caching pre-compiled Zstandard dictionaries to improve decompression performance by avoiding repeated parsing.
Linked Issues check ✅ Passed The PR fully addresses issue #217 objectives: extends decompress_with_dict to use stateful ZstdDictionary handles, caches prepared dictionaries avoiding per-call decode_dict() overhead, and includes benchmarks measuring per-block latency improvements.
Out of Scope Changes check ✅ Passed All changes directly support the core objective of caching pre-compiled dictionaries. The hidden documentation module change is a minor organizational improvement. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#217-perfcompression-cache-pre-compiled-dictionary-acro

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves zstd dictionary decompression performance by caching prepared dictionary state across block decompress calls (FFI backend via OnceLock, pure-Rust backend via TLS), and fixes a correctness bug in the pure-Rust dictionary decompression path. It also adds a Criterion benchmark to measure warm vs cold dictionary decompress latency.

Changes:

  • Cache pre-compiled zstd dictionaries: ZSTD_DDict (FFI) and a TLS FrameDecoder (pure Rust).
  • Update zstd dictionary decompression API to take &ZstdDictionary and adjust call sites.
  • Add a new zstd_dict benchmark for per-block dict decompression latency.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/compression/mod.rs Extends ZstdDictionary with a lazily initialized prepared dictionary cache (FFI) and updates decompress_with_dict to take &ZstdDictionary.
src/compression/zstd_ffi.rs Switches to with_prepared_dictionary(dict.decoder_dict()) to avoid per-call ZSTD_createDDict.
src/compression/zstd_pure.rs Adds TLS caching for dict decompression and changes the decode path to fully decode frames.
src/table/block/mod.rs Updates zstd dict decompression call sites to pass &ZstdDictionary instead of raw bytes.
src/lib.rs Makes the compression module public (hidden from docs) to support benchmark access.
Cargo.toml Registers the new zstd_dict benchmark target.
benches/zstd_dict.rs Adds warm/cold Criterion benchmark for dictionary decompression latency.

Comment thread src/compression/zstd_pure.rs Outdated
Comment thread src/compression/zstd_pure.rs Outdated
Comment thread benches/zstd_dict.rs Outdated
Comment thread benches/zstd_dict.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/zstd_dict.rs`:
- Around line 18-20: The unconditional import of
lsm_tree::compression::ZstdDictionary causes compilation failures when no zstd
backend is enabled; wrap the import with the same cfg used by the backend (e.g.
add #[cfg(zstd_any)] above the use of ZstdDictionary) and ensure any code that
references ZstdDictionary (the benchmark setup and the zstd-specific branch that
currently falls back to a no-op) is also gated behind #[cfg(zstd_any)] so the
file compiles when the feature is absent.

In `@src/compression/zstd_pure.rs`:
- Around line 147-149: Replace the #[allow(...)] Clippy suppressions on the test
module with #[expect(..., reason = "...")] attributes: remove
#[allow(clippy::unwrap_used, clippy::expect_used, reason = "...")] on the tests
mod and add #[expect(clippy::unwrap_used, reason = "...")] and
#[expect(clippy::expect_used, reason = "...")] (one per lint) above mod tests so
the new test code uses Clippy expect annotations compatible with MSRV 1.92.
- Around line 105-122: The TLS reuse currently keys cached decoder by dict.id()
(the 32-bit truncated fingerprint), which can collide; change the cache key in
TLS_DECODER to a collision-resistant identifier (e.g., store and compare the
full dictionary bytes or a full-width hash) and reinitialize when that
identifier differs: when building the decoder in the TLS_DECODER closure (where
state is an Option<(u32, FrameDecoder)>), replace the 32-bit id with a safe key
derived from dict.raw() (or store dict.raw().to_vec() alongside the
FrameDecoder) and compare that key instead of dict.id() before deciding to reuse
the FrameDecoder created via Dictionary::decode_dict and FrameDecoder::add_dict.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae6953a0-786f-4b6d-b182-2bbb82736130

📥 Commits

Reviewing files that changed from the base of the PR and between e0d4113 and 4ee8981.

📒 Files selected for processing (7)
  • Cargo.toml
  • benches/zstd_dict.rs
  • src/compression/mod.rs
  • src/compression/zstd_ffi.rs
  • src/compression/zstd_pure.rs
  • src/lib.rs
  • src/table/block/mod.rs

Comment thread benches/zstd_dict.rs Outdated
Comment thread benches/zstd_dict.rs Outdated
Comment thread src/compression/zstd_pure.rs Outdated
Comment thread src/compression/zstd_pure.rs
…re Rust dict decompress

- Change TLS decoder cache key from truncated u32 to full u64 xxh3
  fingerprint; eliminates cross-dict aliasing when two distinct
  dictionaries share the same lower 32 bits
- Return DecompressedSizeTooLarge when decode_all_to_vec output exceeds
  capacity, matching the bounded behaviour of decompress() and the C backend
- Add regression test: decompress_with_dict_rejects_frame_exceeding_capacity
- Replace #[allow(clippy::...)] with two separate #[expect(..., reason)]
  attributes on the test module (MSRV 1.92 standard)
- Gate bench constants and imports behind #[cfg(zstd_any)] so the file
  compiles with default features (no zstd backend enabled)
- Document that cold bench measures TLS-hit path for pure Rust backend
  (same dict hash persists in TLS across iterations in the same thread)
@polaz polaz requested a review from Copilot April 6, 2026 23:37
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 6, 2026

@coderabbitai full review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread src/compression/zstd_pure.rs Outdated
… guard

The FrameDecoder::init + bounded_read approach does not work: FrameDecoder
processes the full frame at once and its Read impl returns 0 bytes after init
unless driven by decode_all_to_vec. StreamingDecoder supports streaming reads;
FrameDecoder does not.

Restore decode_all_to_vec with an explicit post-decode capacity check:
if output.len() > capacity return DecompressedSizeTooLarge, matching the
bounded behaviour of decompress() and the C FFI backend. Add detailed
comment explaining why bounded_read cannot be used for dict decompression.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread src/compression/mod.rs
Comment thread src/compression/mod.rs Outdated
Comment thread src/compression/zstd_pure.rs
…semantics

- Update ZstdDictionary::new doc: id stored as full 64-bit hash,
  id() truncates to u32 at call time (not at construction)
- Tighten prepared field comment: ZSTD_DDict is cached per handle
  (not globally per unique bytes) via Arc<OnceLock<...>>
- Strengthen decompress_with_dict_rejects_frame_exceeding_capacity:
  assert DecompressedSizeTooLarge variant specifically instead of
  is_err(); normalize FrameDecoderError::TargetTooSmall to
  DecompressedSizeTooLarge for a consistent public error API

Addresses Copilot review threads #10, #11, #12.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 7, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 7, 2026

✅ Actions performed

Full review triggered.

polaz pushed a commit that referenced this pull request Apr 9, 2026
## 🤖 New release

* `coordinode-lsm-tree`: 4.3.1 -> 4.4.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[4.4.0](v4.3.1...v4.4.0)
- 2026-04-09

### Added

- *(compression)* enable dictionary compression in pure Rust backend
([#229](#229))

### Performance

- *(compression)* cache pre-compiled Dictionary across block decompress
calls
([#227](#227))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(compression): cache pre-compiled Dictionary across block decompress calls

2 participants