Skip to content

perf(compression): pre-parse zstd dict once via OnceCell + no-std foundation#273

Merged
polaz merged 5 commits into
mainfrom
perf/#232-prepared-dict-handle
May 19, 2026
Merged

perf(compression): pre-parse zstd dict once via OnceCell + no-std foundation#273
polaz merged 5 commits into
mainfrom
perf/#232-prepared-dict-handle

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 19, 2026

Summary

Closes #232. Supersedes #231. Lays the foundation for #274 (full no-std migration).

Two related changes in this PR:

  1. perf(compression): pre-parsed dictionary handle in ZstdDictionary (blocked on structured-zstd#86) #232 winZstdDictionary carries a lazily-populated Arc<OnceCell<DictionaryHandle>>. First decompress parses the dictionary once globally via OnceCell::get_or_try_init; every subsequent decompress on every thread bumps the Arc refcount and registers the shared handle via FrameDecoder::add_dict_handle. Eliminates the per-thread Dictionary re-parse the TLS FrameDecoder cache used to do on every miss. Single-parse contract across all racers is guaranteed by OnceCell's internal one-shot synchronisation — no explicit Mutex needed, no extra struct field.

  2. no-std foundationCargo.toml now has std (default-on) and alloc features; lib.rs carries #![cfg_attr(not(feature = "std"), no_std)] and extern crate alloc. Default builds keep std enabled, so existing consumers see no behavior change. A new CI job no-std-check runs cargo check --target thumbv7em-none-eabihf --no-default-features --features alloc and tracks migration progress (currently continue-on-error: true until the full migration in no-std + alloc migration: full crate #274 completes). Per-module migration is tracked in no-std + alloc migration: full crate #274.

Changes

  • src/compression/mod.rsZstdDictionary.prepared: Arc<OnceCell<DictionaryHandle>>; prepared_handle() uses get_or_try_init. New unit tests pin: memoization, corrupted-magic returns Err and leaves the cell empty, Clone shares the cell, lazy-init contract (new() does not parse).
  • src/compression/zstd_backend.rs — TLS-miss arm calls dict.prepared_handle()? plus FrameDecoder::add_dict_handle, replacing the inline Dictionary::decode_dict / from_raw_content block.
  • Cargo.tomldefault = ["std"], new std and alloc features, io-uring gated on std. once_cell dep added (replaces std::sync::OnceLock which lacks stable get_or_try_init on our MSRV).
  • src/lib.rs#![cfg_attr(not(feature = "std"), no_std)] and extern crate alloc.
  • .github/workflows/coordinode-ci.yml — new no-std-check job (cross-checks on thumbv7em-none-eabihf, currently continue-on-error: true).

Why this also closes #231

#231 proposed a multi-entry TLS-keyed map to amortise per-thread re-parse when a thread handled multiple distinct dictionaries. With pre-parsed handles each ZstdDictionary carries its own parsed entropy tables — the re-parse the multi-entry cache was supposed to amortise no longer happens. A multi-entry TLS map would amortise nothing and just add lookup cost.

Why no Mutex

The first commit on this branch introduced an explicit Mutex for the single-parse guarantee. Replaced with once_cell::sync::OnceCell::get_or_try_init — same guarantee (winner parses, losers wait on the cell's internal one-shot sync, then read the cached Arc clone), no auxiliary field, cleaner code, and aligns with the project's no-std-first design rule (Mutex specifically blocks no-std builds; this primitive has a no-std-compatible once_cell::race::OnceBox variant we can swap to during #274).

Testing

  • 1447 lib tests pass (4 new for prepared_handle)
  • 42 doc tests pass
  • cargo clippy --all-features --all-targets -- -D warnings clean
  • Default-feature build (cargo check --features zstd) compiles
  • cargo check --no-default-features --features alloc correctly activates no_std (currently ~1075 compile errors; that is the migration backlog — see no-std + alloc migration: full crate #274)

Related

Summary by CodeRabbit

  • New Features

    • Added no-std support infrastructure for embedded and resource-constrained environments.
  • Performance Improvements

    • Improved zstd decompression by introducing a lazily-initialized, shared dictionary cache to avoid repetitive parsing and speed repeated decompression.
  • Robustness

    • Dictionary parsing failures now surface as errors without polluting the cache; clones share cache state and memoization is strictly lazy.
  • Testing & CI

    • Added an expected-to-fail no-std compatibility CI job to track migration progress.

Review Change Stack

Closes #232. Supersedes #231.

## Problem

`decompress_with_dict` cached a per-thread `FrameDecoder` keyed by the
64-bit xxh3 fingerprint. On every cache miss (first call on a thread,
or a dictionary change) it re-parsed the raw dictionary bytes via
`Dictionary::decode_dict` / `Dictionary::from_raw_content` and rebuilt
the Huffman + FSE entropy tables. For a server with N worker threads
that all touch the same dictionary, total parses scaled as O(N) per
dictionary.

## Fix

structured-zstd 0.0.21 (closed structured-zstd#86) exposes
`DictionaryHandle` — an `Arc<Dictionary>` wrapper that is `Send + Sync`
on platforms with `target_has_atomic = "ptr"`, plus
`FrameDecoder::add_dict_handle` that registers the dictionary via the
shared `Arc` rather than cloning the entropy tables.

`ZstdDictionary` now owns an `Arc<OnceLock<DictionaryHandle>>` field.
First call to the new `prepared_handle()` helper parses the dictionary
and seats the result in the OnceLock; every subsequent call across
threads returns an Arc clone of the cached handle. The
`decompress_with_dict` TLS-miss path now calls `add_dict_handle` with
that shared handle — the entropy tables are built once globally and
the per-thread `FrameDecoder` just bumps the Arc refcount.

`ZstdDictionary::new` stays infallible. The OnceLock is empty after
construction and populated lazily on first decompress, so parse-error
surfacing keeps its current timing.

## Why this also closes #231

#231 proposed expanding the per-thread TLS cache to a multi-entry
keyed map so a thread handling multiple distinct dictionaries didn't
constantly re-parse the same ones. With #232 landed, the re-parse the
multi-entry cache was supposed to amortise no longer happens: each
`ZstdDictionary` carries its own parsed handle. A multi-entry TLS map
would amortise nothing now and just add lookup overhead, so #231 is
superseded.

## Tests

Existing `decompress_with_dict` coverage (1397 lib tests + zstd-dict
end-to-end suite) verifies behaviour unchanged. Four new unit tests
pin the new contract:

- `prepared_handle_raw_content_dict_parses_and_memoizes` — second
  call returns the same dict id (cache hit, no re-parse).
- `prepared_handle_rejects_corrupted_finalized_magic` — finalized
  magic bytes followed by garbage surface a parse error rather than
  panicking, and the OnceLock is NOT populated on failure.
- `prepared_handle_shared_across_clones` — `ZstdDictionary::clone`
  shares the inner `Arc<OnceLock<…>>`; parsing through one clone is
  visible to the other.
- `prepared_handle_finalized_dict_parses_from_test_vector` — pin
  laziness: `new()` does NOT eagerly parse; the OnceLock is empty
  until the first `prepared_handle()` call.

1447/1447 lib tests pass; 42/42 doc tests pass; clippy clean
(`--all-features --all-targets -- -D warnings`).
Copilot AI review requested due to automatic review settings May 19, 2026 19:25
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

Adds a lazily-initialized, Arc-shared OnceCell cache for a parsed zstd DictionaryHandle inside ZstdDictionary (feature-gated) and uses it in the decompress path; separately introduces crate no_std opt-in, alloc/std features, once_cell dependency, and a CI no-std-check job.

Changes

Lazy zstd dictionary parsing cache in ZstdDictionary

Layer / File(s) Summary
Dictionary cache contract and lazy initialization
src/compression/mod.rs
Adds a feature-gated prepared: Arc<OnceCell<structured_zstd::decoding::DictionaryHandle>> field, updates Clone and new() to initialize/share the cache without parsing, and implements prepared_handle() which lazily parses finalized or raw-content dictionaries and caches the DictionaryHandle on success.
Decompress path integration with prepared handle
src/compression/zstd_backend.rs
ZstdProvider::decompress_with_dict now calls dict.prepared_handle()? on TLS decoder miss and initializes the FrameDecoder with add_dict_handle(handle) instead of reparsing bytes per miss.
Cache behavior validation tests
src/compression/mod.rs
Adds zstd-feature tests covering raw-content memoization, corrupted finalized-magic error propagation without cache population, cache sharing across clones, and strict laziness (no parse before first prepared_handle()).

no_std opt-in, manifest, and CI

Layer / File(s) Summary
Crate attributes and alloc availability
src/lib.rs
Adds conditional #![cfg_attr(not(feature = "std"), no_std)] and extern crate alloc; to enable incremental no_std migration while preserving alloc heap types.
Cargo features and dependency
Cargo.toml
Makes std the default feature, introduces std and alloc features, updates io-uring and lz4 features to depend on std, extends zstd and encryption features to include std, and adds optional once_cell = \"1\".
CI no-std check job
.github/workflows/coordinode-ci.yml
Adds a no-std-check GitHub Actions job that installs Rust 1.92.0, adds thumbv7em-none-eabihf target, enables Cargo caching, and runs cargo check --no-default-features --features alloc with continue-on-error: true.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #274: Changes overlap in src/compression and the OnceCell/Arc prepared-handle pattern; both modify zstd dictionary preparation and TLS decoder use.

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures both main changes: lazy dict parsing via OnceCell and no-std foundation setup, matching the core objectives.
Linked Issues check ✅ Passed The PR fully addresses #232 (pre-parsed dict handle via OnceCell + lazy init), partially addresses #231 (supersedes by eliminating TLS memoizer need), and establishes no-std groundwork (#274).
Out of Scope Changes check ✅ Passed All changes align with stated objectives: ZstdDictionary caching, zstd_backend integration, feature-gating for no-std, and CI setup for build validation are all in scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#232-prepared-dict-handle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves zstd-dictionary decompression performance by moving dictionary parsing out of the per-thread TLS decoder miss path and into a lazily-initialized, Arc-shared handle stored on ZstdDictionary. This aligns with the compression module’s goal of keeping hot-path block (de)compression fast and minimizing repeated work.

Changes:

  • Add ZstdDictionary::prepared_handle() that lazily parses and memoizes a shared structured_zstd::decoding::DictionaryHandle via Arc<OnceLock<…>>.
  • Update zstd backend decompression to register the shared prepared handle via FrameDecoder::add_dict_handle() on TLS cache misses (instead of re-parsing per thread).
  • Add unit tests in src/compression/mod.rs to pin the lazy + shared-cache contract (with some naming/doc fixes needed).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/compression/mod.rs Adds per-dictionary lazy prepared-handle cache and tests for memoization/laziness/clone-sharing.
src/compression/zstd_backend.rs Switches TLS miss path to use ZstdDictionary::prepared_handle() and FrameDecoder::add_dict_handle().
Comments suppressed due to low confidence (1)

src/compression/mod.rs:716

  • The block comment above this test discusses “canned test-vector dictionaries” and finalized-dict coverage, but the test only checks laziness of the OnceLock. Update the comment to match the assertions (or, if intended, actually use a finalized dict test vector here).
            // Use one of the canned test-vector dictionaries (validity is
            // guaranteed by the existing decompress tests).
            //
            // We cannot easily build a "good" finalized dict here without
            // re-implementing the zstd dict-builder; the existing
            // zstd_backend tests cover the finalized path end-to-end via
            // real compressed frames. This test confirms the

Comment thread src/compression/mod.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/compression/mod.rs`:
- Around line 662-675: The test
prepared_handle_rejects_corrupted_finalized_magic only checks that
prepared_handle() returns an Err but doesn't verify the OnceLock cache wasn't
populated; after asserting result.is_err(), add an assertion that the
ZstdDictionary's prepared OnceLock is still empty by calling
dict.prepared.get().is_none() to ensure failed parses do not populate prepared
(refer to ZstdDictionary, prepared_handle, and the prepared OnceLock).
- Around line 174-198: Currently the code parses the dictionary eagerly on each
racing thread then calls self.prepared.set(handle.clone()), causing N threads to
do the expensive parse; instead move the parse logic into the Once/OnceLock
initialization so only one thread performs parsing and others wait and reuse
that result. Replace the manual check-and-set flow around self.prepared,
DICT_MAGIC, DictionaryHandle::decode_dict and Dictionary::from_raw_content with
a single call to the Once/OnceLock "get_or_init" / "get_or_try_init" style API
(use get_or_try_init if available to preserve fallible init and
retry-on-failure), performing the DICT_MAGIC branch and calls to
DictionaryHandle::decode_dict or Dictionary::from_raw_content inside that
closure and returning/propagating any error; then return a clone/reference to
the prepared handle from the OnceLock rather than parsing first and then set().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd77e7b9-472a-46af-8edf-f8d9dc0bd87a

📥 Commits

Reviewing files that changed from the base of the PR and between 4239a3d and 5a55d77.

📒 Files selected for processing (2)
  • src/compression/mod.rs
  • src/compression/zstd_backend.rs

Comment thread src/compression/mod.rs Outdated
Comment thread src/compression/mod.rs
…parse, not N

The original `OnceLock`-only `prepared_handle` lets N concurrent
first-callers each run the full dictionary parse before reaching
`OnceLock::set`. The losers' duplicate `DictionaryHandle`s then propagate
into their TLS `FrameDecoder`s, leaving N parsed-table copies alive —
defeating the main contract of "O(1) parse per dict, not O(N threads)".

Add a `prepare_lock: Arc<Mutex<()>>` and a double-check inside the
critical section. Fast path remains lock-free (`OnceLock::get` before
the mutex), so steady-state decompress calls keep the previous zero-
overhead behaviour. The mutex is contended ONLY on the first cold
start of each `ZstdDictionary`.

Tests:

- Rename `prepared_handle_finalized_dict_parses_from_test_vector` to
  `prepared_handle_is_lazy_and_populated_after_first_call`. The test
  body never used a finalized dict or a test vector; it asserts the
  lazy-init / first-call-populates-cache contract, and the new name
  says so.
- Extend `prepared_handle_rejects_corrupted_finalized_magic` to also
  assert `dict.prepared.get().is_none()` after the failing call. The
  previous test would have silently passed if a failed parse had cached
  a poisoned value; the retry-on-failure contract is now pinned.

1447/1447 lib tests pass; clippy clean under
`--all-features --all-targets -- -D warnings`.
@polaz polaz changed the title perf(compression): pre-parse zstd dictionary once via Arc handle perf(compression): pre-parse zstd dict once via OnceCell + no-std foundation May 19, 2026
…nit + no-std foundation

The earlier Mutex-based single-parse guarantee was equivalent to but
strictly worse than `once_cell::sync::OnceCell::get_or_try_init` —
same one-shot synchronisation primitive (winner parses, losers wait
on the cell's internal state), but without the auxiliary `Arc<Mutex<()>>`
field, the double-check pattern, and the std-specific Mutex dependency.

`OnceCell::get_or_try_init` from the `once_cell` crate is used here
because `std::sync::OnceLock::get_or_try_init` is still gated behind
the unstable `once_cell_try` feature on our MSRV (1.92). The
`once_cell` crate also provides a no-std + alloc compatible variant
(`once_cell::race::OnceBox`) we can swap to during the no-std
migration, so this code path is now compile-time-trait-friendly
rather than std-locked.

Foundation for no-std + alloc support:

- Cargo features: `default = ["std"]`, new `std` and `alloc` features.
  `io-uring` is now gated on `std`. Existing consumers see no
  behaviour change (default keeps `std` on).
- `src/lib.rs`: `#![cfg_attr(not(feature = "std"), no_std)]` and
  `extern crate alloc;`.
- New CI job `no-std-check`: runs
  `cargo check --target thumbv7em-none-eabihf --no-default-features
  --features alloc`. Currently `continue-on-error: true` — the job is
  the canonical migration progress meter (compile-error count must
  monotonically decrease across PRs). When it reaches zero we drop
  the `continue-on-error` flag and it becomes a gating check.

The full per-module migration is multi-week and tracked separately
in the new tracking issue; this PR delivers only the foundation +
the dictionary-handle perf fix that originally triggered the no-std
audit.

1447 lib tests pass; clippy clean under `--all-features --all-targets`.
Copilot AI review requested due to automatic review settings May 19, 2026 20:05
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/coordinode-ci.yml:
- Around line 106-111: Update the GitHub Actions steps that reference action
tags (actions/checkout@v6, dtolnay/rust-toolchain@stable,
Swatinem/rust-cache@v2) to use pinned commit SHAs instead of mutable tags;
locate the three uses in the workflow file and replace the tag versions with the
corresponding commit SHA for each action (ensure the SHAs match the intended tag
versions), keep the same with: inputs (toolchain/targets) intact for
dtolnay/rust-toolchain, and verify the workflow still runs in CI after updating
the three references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5b6dfed5-8559-43ed-9acb-9e9932bd4d7c

📥 Commits

Reviewing files that changed from the base of the PR and between 5a55d77 and 75dfe3d.

📒 Files selected for processing (4)
  • .github/workflows/coordinode-ci.yml
  • Cargo.toml
  • src/compression/mod.rs
  • src/lib.rs

Comment thread .github/workflows/coordinode-ci.yml
…feature swap

Cargo.toml now has `default = ["std"]`, so `cargo --no-default-features
--features zstd,lz4` activates `no_std` — the crate is not yet
no-std-clean (#274 tracks the migration), and CI's `test-zstd` and
`codecov` jobs that ran exactly that invocation surfaced ~1075
unresolved-`std` errors as a consequence.

These two jobs are scoped to "test the zstd backend without the lz4
default" or similar — they were NOT meant to test no-std. Spell out
`std,zstd,lz4` so the std-implicit baseline is preserved.

The no-std-only target build lives in the new `no-std-check` job (also
in this file), which explicitly uses
`--target thumbv7em-none-eabihf --no-default-features --features alloc`
and stays `continue-on-error: true` until #274 lands.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread Cargo.toml Outdated
Comment thread Cargo.toml Outdated
…ce_cell

Without these gates, `cargo --no-default-features --features zstd,lz4`
flipped the crate into `no_std` (because `--no-default-features`
implicitly turns off the `std` feature now that it lives in defaults)
while still pulling in modules that use `std::fs` / `std::sync` /
`thread_local!` — surfacing as ~1000 unresolved-`std` errors at build
time. Two related changes:

- Every feature whose modules currently call `std::*` directly
  (`zstd`, `lz4`, `encryption`, `io-uring`) now depends on `std` in
  Cargo.toml's feature table. `--no-default-features --features zstd`
  therefore activates `std` transitively. The CI `test-zstd` job can
  go back to its original `--features zstd,lz4` invocation; the
  explicit `std,` prefix I added in the earlier CI fix is now
  redundant and removed.
- `once_cell` becomes optional and is rolled into the `zstd` feature
  alongside `structured-zstd`. The crate is used only by
  `ZstdDictionary::prepared_handle` under `#[cfg(feature = "zstd")]`;
  pulling it unconditionally added a direct edge to a dep that
  non-zstd builds never need (transitive copies via `enum_dispatch` /
  `tempfile` remain, but expressing the direct edge correctly keeps
  the manifest honest).

As the no-std migration epic ports modules off `std::*`, the `std`
transitive dep can be lifted on a per-feature basis.

Default-feature build (`cargo check`) and `--no-default-features
--features zstd,lz4` both compile clean; 940 lib tests pass;
clippy clean under `--all-features --all-targets`.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/coordinode-ci.yml:
- Around line 101-107: The job currently inherits full GITHUB_TOKEN permissions
and leaves checkout credentials enabled; update the job to explicitly set
minimal permissions (e.g., add a job-level permissions mapping with contents:
read) and modify the checkout step (the uses: actions/checkout@v6 step) to
disable persistence of credentials by adding the with: persist-credentials:
false setting so the workflow only has read-only repo access and does not keep
the token in the workspace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7cfe4968-9b18-45ed-9bf7-bc988a7fb8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 75dfe3d and d54899a.

📒 Files selected for processing (2)
  • .github/workflows/coordinode-ci.yml
  • Cargo.toml

Comment thread .github/workflows/coordinode-ci.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants