perf(compression): cache pre-compiled Dictionary across block decompress calls#227
Conversation
…ess calls - C FFI backend: cache `DecoderDictionary<'static>` (ZSTD_DDict) in `ZstdDictionary` via `Arc<OnceLock<...>>` — parsed once per process, shared across all clones of the same dictionary handle - Pure Rust backend: cache `FrameDecoder` with dictionary pre-loaded in thread-local storage keyed by dict ID — parsed once per thread - Fix latent correctness bug in pure Rust `decompress_with_dict`: was calling `init(data)` on a Copy slice (reads frame header only, output buffer stays empty); replace with `decode_all_to_vec` which takes `&mut input` and fully decodes the frame - Change `CompressionProvider::decompress_with_dict` signature from `dict_raw: &[u8]` to `dict: &ZstdDictionary` to give backends access to the cached prepared form; update all four call sites in block/mod.rs - Add `ZstdDictionary::decoder_dict()` — lazily initialises ZSTD_DDict via `OnceLock::get_or_init` (C FFI only) - Add unit tests for pure Rust backend with pre-generated test vectors (decompress + idempotent repeated calls exercising TLS cache path) - Add `benches/zstd_dict.rs` with warm / cold per-block latency benchmarks - Expose `#[doc(hidden)] pub mod compression` so benchmarks can reach `CompressionProvider` and `ZstdBackend` type alias Closes #217
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughRefactors zstd dictionary handling to pass a Changes
Sequence Diagram(s)sequenceDiagram
participant BlockReader as Block Reader
participant ZstdDict as ZstdDictionary
participant Cache as Prepared Dict Cache
participant Provider as Compression Provider
BlockReader->>ZstdDict: request decoder_dict()
ZstdDict->>Cache: get_or_init()
alt cache miss
Cache->>Cache: prepare DecoderDictionary
Cache-->>ZstdDict: return prepared dict (cached)
else cache hit
Cache-->>ZstdDict: return prepared dict
end
BlockReader->>Provider: decompress_with_dict(data, &ZstdDict)
Provider->>ZstdDict: decoder_dict() (use cached prepared dict)
Provider-->>BlockReader: decompressed data
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR improves zstd dictionary decompression performance by caching prepared dictionary state across block decompress calls (FFI backend via OnceLock, pure-Rust backend via TLS), and fixes a correctness bug in the pure-Rust dictionary decompression path. It also adds a Criterion benchmark to measure warm vs cold dictionary decompress latency.
Changes:
- Cache pre-compiled zstd dictionaries:
ZSTD_DDict(FFI) and a TLSFrameDecoder(pure Rust). - Update zstd dictionary decompression API to take
&ZstdDictionaryand adjust call sites. - Add a new
zstd_dictbenchmark for per-block dict decompression latency.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/compression/mod.rs |
Extends ZstdDictionary with a lazily initialized prepared dictionary cache (FFI) and updates decompress_with_dict to take &ZstdDictionary. |
src/compression/zstd_ffi.rs |
Switches to with_prepared_dictionary(dict.decoder_dict()) to avoid per-call ZSTD_createDDict. |
src/compression/zstd_pure.rs |
Adds TLS caching for dict decompression and changes the decode path to fully decode frames. |
src/table/block/mod.rs |
Updates zstd dict decompression call sites to pass &ZstdDictionary instead of raw bytes. |
src/lib.rs |
Makes the compression module public (hidden from docs) to support benchmark access. |
Cargo.toml |
Registers the new zstd_dict benchmark target. |
benches/zstd_dict.rs |
Adds warm/cold Criterion benchmark for dictionary decompression latency. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benches/zstd_dict.rs`:
- Around line 18-20: The unconditional import of
lsm_tree::compression::ZstdDictionary causes compilation failures when no zstd
backend is enabled; wrap the import with the same cfg used by the backend (e.g.
add #[cfg(zstd_any)] above the use of ZstdDictionary) and ensure any code that
references ZstdDictionary (the benchmark setup and the zstd-specific branch that
currently falls back to a no-op) is also gated behind #[cfg(zstd_any)] so the
file compiles when the feature is absent.
In `@src/compression/zstd_pure.rs`:
- Around line 147-149: Replace the #[allow(...)] Clippy suppressions on the test
module with #[expect(..., reason = "...")] attributes: remove
#[allow(clippy::unwrap_used, clippy::expect_used, reason = "...")] on the tests
mod and add #[expect(clippy::unwrap_used, reason = "...")] and
#[expect(clippy::expect_used, reason = "...")] (one per lint) above mod tests so
the new test code uses Clippy expect annotations compatible with MSRV 1.92.
- Around line 105-122: The TLS reuse currently keys cached decoder by dict.id()
(the 32-bit truncated fingerprint), which can collide; change the cache key in
TLS_DECODER to a collision-resistant identifier (e.g., store and compare the
full dictionary bytes or a full-width hash) and reinitialize when that
identifier differs: when building the decoder in the TLS_DECODER closure (where
state is an Option<(u32, FrameDecoder)>), replace the 32-bit id with a safe key
derived from dict.raw() (or store dict.raw().to_vec() alongside the
FrameDecoder) and compare that key instead of dict.id() before deciding to reuse
the FrameDecoder created via Dictionary::decode_dict and FrameDecoder::add_dict.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ae6953a0-786f-4b6d-b182-2bbb82736130
📒 Files selected for processing (7)
Cargo.tomlbenches/zstd_dict.rssrc/compression/mod.rssrc/compression/zstd_ffi.rssrc/compression/zstd_pure.rssrc/lib.rssrc/table/block/mod.rs
…re Rust dict decompress - Change TLS decoder cache key from truncated u32 to full u64 xxh3 fingerprint; eliminates cross-dict aliasing when two distinct dictionaries share the same lower 32 bits - Return DecompressedSizeTooLarge when decode_all_to_vec output exceeds capacity, matching the bounded behaviour of decompress() and the C backend - Add regression test: decompress_with_dict_rejects_frame_exceeding_capacity - Replace #[allow(clippy::...)] with two separate #[expect(..., reason)] attributes on the test module (MSRV 1.92 standard) - Gate bench constants and imports behind #[cfg(zstd_any)] so the file compiles with default features (no zstd backend enabled) - Document that cold bench measures TLS-hit path for pure Rust backend (same dict hash persists in TLS across iterations in the same thread)
|
@coderabbitai full review |
… guard The FrameDecoder::init + bounded_read approach does not work: FrameDecoder processes the full frame at once and its Read impl returns 0 bytes after init unless driven by decode_all_to_vec. StreamingDecoder supports streaming reads; FrameDecoder does not. Restore decode_all_to_vec with an explicit post-decode capacity check: if output.len() > capacity return DecompressedSizeTooLarge, matching the bounded behaviour of decompress() and the C FFI backend. Add detailed comment explaining why bounded_read cannot be used for dict decompression.
…semantics - Update ZstdDictionary::new doc: id stored as full 64-bit hash, id() truncates to u32 at call time (not at construction) - Tighten prepared field comment: ZSTD_DDict is cached per handle (not globally per unique bytes) via Arc<OnceLock<...>> - Strengthen decompress_with_dict_rejects_frame_exceeding_capacity: assert DecompressedSizeTooLarge variant specifically instead of is_err(); normalize FrameDecoderError::TargetTooSmall to DecompressedSizeTooLarge for a consistent public error API Addresses Copilot review threads #10, #11, #12.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
## 🤖 New release * `coordinode-lsm-tree`: 4.3.1 -> 4.4.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [4.4.0](v4.3.1...v4.4.0) - 2026-04-09 ### Added - *(compression)* enable dictionary compression in pure Rust backend ([#229](#229)) ### Performance - *(compression)* cache pre-compiled Dictionary across block decompress calls ([#227](#227)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>
Summary
DecoderDictionary<'static>(wrapsZSTD_DDict) is now cached inZstdDictionaryviaArc<OnceLock<...>>— parsed once per process, shared across all clones of the same dictionary handle, zero re-parsing on subsequent blocksFrameDecoderwith dictionary pre-loaded is cached in thread-local storage keyed bydict_id— parsed once per thread, no mutex needed (FrameDecoderis!Send)decompress_with_dict— was callinginit(data)on aCopyslice (only read the frame header; decode buffer stayed empty, always returningOk([])); replaced withdecode_all_to_vec(&mut input)which fully decodes the frameChanges
src/compression/mod.rsprepared: Arc<OnceLock<DecoderDictionary<'static>>>toZstdDictionary; adddecoder_dict()accessor; changedecompress_with_dictsignature to take&ZstdDictionarysrc/compression/zstd_ffi.rsDecompressor::with_prepared_dictionary(dict.decoder_dict())— no more per-callZSTD_createDDictsrc/compression/zstd_pure.rsFrameDecoder; fix correctness bug; add unit tests with pre-generated test vectorssrc/table/block/mod.rsdecompress_with_dictcall sites to pass&dictinstead ofdict.raw()benches/zstd_dict.rsTest Plan
cargo clippy --features zstd --all-targets -- -D warnings— cleancargo clippy --features zstd-pure --all-targets -- -D warnings— cleancargo nextest run --features zstd --workspace— 1168/1168 passedcargo nextest run --features zstd-pure --workspace— 1157/1157 passedcargo test --doc --workspace— 41/41 passedcargo build --bench zstd_dict --features zstd— compilescargo build --bench zstd_dict --features zstd-pure— compilesCloses #217
Summary by CodeRabbit
Tests
Refactor