perf(compression): pre-parse zstd dict once via OnceCell + no-std foundation#273
Conversation
Closes #232. Supersedes #231. ## Problem `decompress_with_dict` cached a per-thread `FrameDecoder` keyed by the 64-bit xxh3 fingerprint. On every cache miss (first call on a thread, or a dictionary change) it re-parsed the raw dictionary bytes via `Dictionary::decode_dict` / `Dictionary::from_raw_content` and rebuilt the Huffman + FSE entropy tables. For a server with N worker threads that all touch the same dictionary, total parses scaled as O(N) per dictionary. ## Fix structured-zstd 0.0.21 (closed structured-zstd#86) exposes `DictionaryHandle` — an `Arc<Dictionary>` wrapper that is `Send + Sync` on platforms with `target_has_atomic = "ptr"`, plus `FrameDecoder::add_dict_handle` that registers the dictionary via the shared `Arc` rather than cloning the entropy tables. `ZstdDictionary` now owns an `Arc<OnceLock<DictionaryHandle>>` field. First call to the new `prepared_handle()` helper parses the dictionary and seats the result in the OnceLock; every subsequent call across threads returns an Arc clone of the cached handle. The `decompress_with_dict` TLS-miss path now calls `add_dict_handle` with that shared handle — the entropy tables are built once globally and the per-thread `FrameDecoder` just bumps the Arc refcount. `ZstdDictionary::new` stays infallible. The OnceLock is empty after construction and populated lazily on first decompress, so parse-error surfacing keeps its current timing. ## Why this also closes #231 #231 proposed expanding the per-thread TLS cache to a multi-entry keyed map so a thread handling multiple distinct dictionaries didn't constantly re-parse the same ones. With #232 landed, the re-parse the multi-entry cache was supposed to amortise no longer happens: each `ZstdDictionary` carries its own parsed handle. A multi-entry TLS map would amortise nothing now and just add lookup overhead, so #231 is superseded. ## Tests Existing `decompress_with_dict` coverage (1397 lib tests + zstd-dict end-to-end suite) verifies behaviour unchanged. Four new unit tests pin the new contract: - `prepared_handle_raw_content_dict_parses_and_memoizes` — second call returns the same dict id (cache hit, no re-parse). - `prepared_handle_rejects_corrupted_finalized_magic` — finalized magic bytes followed by garbage surface a parse error rather than panicking, and the OnceLock is NOT populated on failure. - `prepared_handle_shared_across_clones` — `ZstdDictionary::clone` shares the inner `Arc<OnceLock<…>>`; parsing through one clone is visible to the other. - `prepared_handle_finalized_dict_parses_from_test_vector` — pin laziness: `new()` does NOT eagerly parse; the OnceLock is empty until the first `prepared_handle()` call. 1447/1447 lib tests pass; 42/42 doc tests pass; clippy clean (`--all-features --all-targets -- -D warnings`).
📝 WalkthroughWalkthroughAdds a lazily-initialized, Arc-shared ChangesLazy zstd dictionary parsing cache in ZstdDictionary
no_std opt-in, manifest, and CI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves zstd-dictionary decompression performance by moving dictionary parsing out of the per-thread TLS decoder miss path and into a lazily-initialized, Arc-shared handle stored on ZstdDictionary. This aligns with the compression module’s goal of keeping hot-path block (de)compression fast and minimizing repeated work.
Changes:
- Add
ZstdDictionary::prepared_handle()that lazily parses and memoizes a sharedstructured_zstd::decoding::DictionaryHandleviaArc<OnceLock<…>>. - Update zstd backend decompression to register the shared prepared handle via
FrameDecoder::add_dict_handle()on TLS cache misses (instead of re-parsing per thread). - Add unit tests in
src/compression/mod.rsto pin the lazy + shared-cache contract (with some naming/doc fixes needed).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/compression/mod.rs | Adds per-dictionary lazy prepared-handle cache and tests for memoization/laziness/clone-sharing. |
| src/compression/zstd_backend.rs | Switches TLS miss path to use ZstdDictionary::prepared_handle() and FrameDecoder::add_dict_handle(). |
Comments suppressed due to low confidence (1)
src/compression/mod.rs:716
- The block comment above this test discusses “canned test-vector dictionaries” and finalized-dict coverage, but the test only checks laziness of the
OnceLock. Update the comment to match the assertions (or, if intended, actually use a finalized dict test vector here).
// Use one of the canned test-vector dictionaries (validity is
// guaranteed by the existing decompress tests).
//
// We cannot easily build a "good" finalized dict here without
// re-implementing the zstd dict-builder; the existing
// zstd_backend tests cover the finalized path end-to-end via
// real compressed frames. This test confirms the
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/compression/mod.rs`:
- Around line 662-675: The test
prepared_handle_rejects_corrupted_finalized_magic only checks that
prepared_handle() returns an Err but doesn't verify the OnceLock cache wasn't
populated; after asserting result.is_err(), add an assertion that the
ZstdDictionary's prepared OnceLock is still empty by calling
dict.prepared.get().is_none() to ensure failed parses do not populate prepared
(refer to ZstdDictionary, prepared_handle, and the prepared OnceLock).
- Around line 174-198: Currently the code parses the dictionary eagerly on each
racing thread then calls self.prepared.set(handle.clone()), causing N threads to
do the expensive parse; instead move the parse logic into the Once/OnceLock
initialization so only one thread performs parsing and others wait and reuse
that result. Replace the manual check-and-set flow around self.prepared,
DICT_MAGIC, DictionaryHandle::decode_dict and Dictionary::from_raw_content with
a single call to the Once/OnceLock "get_or_init" / "get_or_try_init" style API
(use get_or_try_init if available to preserve fallible init and
retry-on-failure), performing the DICT_MAGIC branch and calls to
DictionaryHandle::decode_dict or Dictionary::from_raw_content inside that
closure and returning/propagating any error; then return a clone/reference to
the prepared handle from the OnceLock rather than parsing first and then set().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fd77e7b9-472a-46af-8edf-f8d9dc0bd87a
📒 Files selected for processing (2)
src/compression/mod.rssrc/compression/zstd_backend.rs
…parse, not N The original `OnceLock`-only `prepared_handle` lets N concurrent first-callers each run the full dictionary parse before reaching `OnceLock::set`. The losers' duplicate `DictionaryHandle`s then propagate into their TLS `FrameDecoder`s, leaving N parsed-table copies alive — defeating the main contract of "O(1) parse per dict, not O(N threads)". Add a `prepare_lock: Arc<Mutex<()>>` and a double-check inside the critical section. Fast path remains lock-free (`OnceLock::get` before the mutex), so steady-state decompress calls keep the previous zero- overhead behaviour. The mutex is contended ONLY on the first cold start of each `ZstdDictionary`. Tests: - Rename `prepared_handle_finalized_dict_parses_from_test_vector` to `prepared_handle_is_lazy_and_populated_after_first_call`. The test body never used a finalized dict or a test vector; it asserts the lazy-init / first-call-populates-cache contract, and the new name says so. - Extend `prepared_handle_rejects_corrupted_finalized_magic` to also assert `dict.prepared.get().is_none()` after the failing call. The previous test would have silently passed if a failed parse had cached a poisoned value; the retry-on-failure contract is now pinned. 1447/1447 lib tests pass; clippy clean under `--all-features --all-targets -- -D warnings`.
…nit + no-std foundation The earlier Mutex-based single-parse guarantee was equivalent to but strictly worse than `once_cell::sync::OnceCell::get_or_try_init` — same one-shot synchronisation primitive (winner parses, losers wait on the cell's internal state), but without the auxiliary `Arc<Mutex<()>>` field, the double-check pattern, and the std-specific Mutex dependency. `OnceCell::get_or_try_init` from the `once_cell` crate is used here because `std::sync::OnceLock::get_or_try_init` is still gated behind the unstable `once_cell_try` feature on our MSRV (1.92). The `once_cell` crate also provides a no-std + alloc compatible variant (`once_cell::race::OnceBox`) we can swap to during the no-std migration, so this code path is now compile-time-trait-friendly rather than std-locked. Foundation for no-std + alloc support: - Cargo features: `default = ["std"]`, new `std` and `alloc` features. `io-uring` is now gated on `std`. Existing consumers see no behaviour change (default keeps `std` on). - `src/lib.rs`: `#![cfg_attr(not(feature = "std"), no_std)]` and `extern crate alloc;`. - New CI job `no-std-check`: runs `cargo check --target thumbv7em-none-eabihf --no-default-features --features alloc`. Currently `continue-on-error: true` — the job is the canonical migration progress meter (compile-error count must monotonically decrease across PRs). When it reaches zero we drop the `continue-on-error` flag and it becomes a gating check. The full per-module migration is multi-week and tracked separately in the new tracking issue; this PR delivers only the foundation + the dictionary-handle perf fix that originally triggered the no-std audit. 1447 lib tests pass; clippy clean under `--all-features --all-targets`.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/coordinode-ci.yml:
- Around line 106-111: Update the GitHub Actions steps that reference action
tags (actions/checkout@v6, dtolnay/rust-toolchain@stable,
Swatinem/rust-cache@v2) to use pinned commit SHAs instead of mutable tags;
locate the three uses in the workflow file and replace the tag versions with the
corresponding commit SHA for each action (ensure the SHAs match the intended tag
versions), keep the same with: inputs (toolchain/targets) intact for
dtolnay/rust-toolchain, and verify the workflow still runs in CI after updating
the three references.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5b6dfed5-8559-43ed-9acb-9e9932bd4d7c
📒 Files selected for processing (4)
.github/workflows/coordinode-ci.ymlCargo.tomlsrc/compression/mod.rssrc/lib.rs
…feature swap Cargo.toml now has `default = ["std"]`, so `cargo --no-default-features --features zstd,lz4` activates `no_std` — the crate is not yet no-std-clean (#274 tracks the migration), and CI's `test-zstd` and `codecov` jobs that ran exactly that invocation surfaced ~1075 unresolved-`std` errors as a consequence. These two jobs are scoped to "test the zstd backend without the lz4 default" or similar — they were NOT meant to test no-std. Spell out `std,zstd,lz4` so the std-implicit baseline is preserved. The no-std-only target build lives in the new `no-std-check` job (also in this file), which explicitly uses `--target thumbv7em-none-eabihf --no-default-features --features alloc` and stays `continue-on-error: true` until #274 lands.
…ce_cell Without these gates, `cargo --no-default-features --features zstd,lz4` flipped the crate into `no_std` (because `--no-default-features` implicitly turns off the `std` feature now that it lives in defaults) while still pulling in modules that use `std::fs` / `std::sync` / `thread_local!` — surfacing as ~1000 unresolved-`std` errors at build time. Two related changes: - Every feature whose modules currently call `std::*` directly (`zstd`, `lz4`, `encryption`, `io-uring`) now depends on `std` in Cargo.toml's feature table. `--no-default-features --features zstd` therefore activates `std` transitively. The CI `test-zstd` job can go back to its original `--features zstd,lz4` invocation; the explicit `std,` prefix I added in the earlier CI fix is now redundant and removed. - `once_cell` becomes optional and is rolled into the `zstd` feature alongside `structured-zstd`. The crate is used only by `ZstdDictionary::prepared_handle` under `#[cfg(feature = "zstd")]`; pulling it unconditionally added a direct edge to a dep that non-zstd builds never need (transitive copies via `enum_dispatch` / `tempfile` remain, but expressing the direct edge correctly keeps the manifest honest). As the no-std migration epic ports modules off `std::*`, the `std` transitive dep can be lifted on a per-feature basis. Default-feature build (`cargo check`) and `--no-default-features --features zstd,lz4` both compile clean; 940 lib tests pass; clippy clean under `--all-features --all-targets`.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/coordinode-ci.yml:
- Around line 101-107: The job currently inherits full GITHUB_TOKEN permissions
and leaves checkout credentials enabled; update the job to explicitly set
minimal permissions (e.g., add a job-level permissions mapping with contents:
read) and modify the checkout step (the uses: actions/checkout@v6 step) to
disable persistence of credentials by adding the with: persist-credentials:
false setting so the workflow only has read-only repo access and does not keep
the token in the workspace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7cfe4968-9b18-45ed-9bf7-bc988a7fb8aa
📒 Files selected for processing (2)
.github/workflows/coordinode-ci.ymlCargo.toml
Summary
Closes #232. Supersedes #231. Lays the foundation for #274 (full no-std migration).
Two related changes in this PR:
perf(compression): pre-parsed dictionary handle in ZstdDictionary (blocked on structured-zstd#86) #232 win —
ZstdDictionarycarries a lazily-populatedArc<OnceCell<DictionaryHandle>>. First decompress parses the dictionary once globally viaOnceCell::get_or_try_init; every subsequent decompress on every thread bumps the Arc refcount and registers the shared handle viaFrameDecoder::add_dict_handle. Eliminates the per-threadDictionaryre-parse the TLSFrameDecodercache used to do on every miss. Single-parse contract across all racers is guaranteed byOnceCell's internal one-shot synchronisation — no explicitMutexneeded, no extra struct field.no-std foundation —
Cargo.tomlnow hasstd(default-on) andallocfeatures;lib.rscarries#![cfg_attr(not(feature = "std"), no_std)]andextern crate alloc. Default builds keepstdenabled, so existing consumers see no behavior change. A new CI jobno-std-checkrunscargo check --target thumbv7em-none-eabihf --no-default-features --features allocand tracks migration progress (currentlycontinue-on-error: trueuntil the full migration in no-std + alloc migration: full crate #274 completes). Per-module migration is tracked in no-std + alloc migration: full crate #274.Changes
src/compression/mod.rs—ZstdDictionary.prepared: Arc<OnceCell<DictionaryHandle>>;prepared_handle()usesget_or_try_init. New unit tests pin: memoization, corrupted-magic returns Err and leaves the cell empty,Cloneshares the cell, lazy-init contract (new()does not parse).src/compression/zstd_backend.rs— TLS-miss arm callsdict.prepared_handle()?plusFrameDecoder::add_dict_handle, replacing the inlineDictionary::decode_dict/from_raw_contentblock.Cargo.toml—default = ["std"], newstdandallocfeatures,io-uringgated onstd.once_celldep added (replacesstd::sync::OnceLockwhich lacks stableget_or_try_initon our MSRV).src/lib.rs—#![cfg_attr(not(feature = "std"), no_std)]andextern crate alloc..github/workflows/coordinode-ci.yml— newno-std-checkjob (cross-checks onthumbv7em-none-eabihf, currentlycontinue-on-error: true).Why this also closes #231
#231 proposed a multi-entry TLS-keyed map to amortise per-thread re-parse when a thread handled multiple distinct dictionaries. With pre-parsed handles each
ZstdDictionarycarries its own parsed entropy tables — the re-parse the multi-entry cache was supposed to amortise no longer happens. A multi-entry TLS map would amortise nothing and just add lookup cost.Why no Mutex
The first commit on this branch introduced an explicit
Mutexfor the single-parse guarantee. Replaced withonce_cell::sync::OnceCell::get_or_try_init— same guarantee (winner parses, losers wait on the cell's internal one-shot sync, then read the cached Arc clone), no auxiliary field, cleaner code, and aligns with the project's no-std-first design rule (Mutex specifically blocks no-std builds; this primitive has a no-std-compatibleonce_cell::race::OnceBoxvariant we can swap to during #274).Testing
prepared_handle)cargo clippy --all-features --all-targets -- -D warningscleancargo check --features zstd) compilescargo check --no-default-features --features alloccorrectly activates no_std (currently ~1075 compile errors; that is the migration backlog — see no-std + alloc migration: full crate #274)Related
Summary by CodeRabbit
New Features
Performance Improvements
Robustness
Testing & CI