Deterministic Entropy Coding Cross-Architecture by jbrough · Pull Request #95 · facebookresearch/encodec

jbrough · 2025-09-08T22:26:21Z

Deterministic cross-platform LM entropy coding, acv=4 CRC chunk framing, and `_counts_from_pdf` bug fix

Summary

This PR hardens the LM-backed entropy coding path for cross-platform correctness and adds per-segment failure isolation. The neural network weights and audio quality are unchanged. All existing .ecdc files decode correctly.

Motivation

Three problems with the current LM entropy path:

Non-deterministic across hardware. torch.softmax can differ by a ULP between CPU, MPS, and CUDA. The arithmetic coder amplifies these differences — a single wrong probability pushes the decode state off track, producing EOFError or silent garbage. Payloads encoded on an Apple Silicon Mac reliably fail to decode on Linux CPU or CUDA.
Silent corrupt decode at tau=1.0. In _counts_from_pdf, the near-integer perturbation uses an alternating sign. When a token's probability is exactly 0.0 (common at tau=1.0 due to float underflow of exp(-large)), the negative perturbation gives x = -ε, then floor(-ε) = -1. A negative count makes the CDF non-monotonic; the decoder produces wrong symbols with no error raised.
No failure isolation. A single corrupt byte anywhere in the payload desynchronises the arithmetic decoder and destroys the rest of the file.

Changes

`encodec/compress.py`

Deterministic CDF construction

_stable_softmax: computes softmax in float64 using a sequential cumsum denominator rather than torch.softmax. Cross-architecture bit-reproducibility verified Mac CPU/MPS → Linux CPU/CUDA.
_quantize_logits_: rounds logits to a 1/128 grid before softmax. Tiny floating-point differences that don't change the quantised logit produce identical CDFs.
_counts_from_pdf: adds clamp_min(0) after the near-integer perturbation step, fixing the negative-count bug at tau=1.0.
_deterministic_cdf / _deterministic_cdf_multi: integer floor + priority allocation CDF construction at FP_SCALE=65536 precision. Replaces float-based CDF that was sensitive to platform differences.

Bitstream version acv=4 with CRC chunk framing

Each model segment is wrapped in [chunk_len: u32 BE][crc32: u32 BE][payload].
A corrupt chunk is replaced with silence for that segment; the rest of the file decodes normally.
tau is stored in the header so encoder and decoder are always in sync without out-of-band configuration.

GPU reliability

compress_to_file detects the model device and moves the waveform there automatically (wav[None].to(model_device)). Previously crashed when the model was on MPS or CUDA.
LM and arithmetic coder always run on CPU for cross-platform determinism regardless of model device.

Tunable defaults (via env vars; existing behaviour unchanged if not set):

Variable	Default
`ENCODEC_LM_TAU`	`1.0`
`ENCODEC_LOGIT_QSTEP`	`1/128`
`ENCODEC_AC_FP_SCALE`	`65536`
`ENCODEC_AC_MIN_RANGE`	`1`
`ENCODEC_DETERMINISTIC_LM_DTYPE`	`float32`

`encodec/model.py`

LMModel.forward_logits: factored out from forward so the deterministic and legacy paths share the transformer forward pass.
LMModel.forward_legacy: raw softmax with no quantisation, used for decoding acv < 3 streams.
LMModel.__init__: accepts tau parameter.
EncodecModel.get_lm_model: accepts device and dtype parameters for explicit LM placement.

`scripts/`

precision_eval.py: CLI for benchmarking bitrate, SNR, encode/decode wall time, CPU vs MPS, LM vs non-LM, and single-byte corruption behaviour (targets chunk bodies, not headers/CRC).
payload_decode_matrix.py: decodes a payload across CPU and CUDA and compares results; intended for cross-host determinism validation.

Backwards compatibility

Reading old streams: fully preserved. The decoder reads the acv field from the stream header and routes accordingly:

`acv`	Path	Notes
`0`	Raw bitpacking, no LM	Unchanged
`1` / `2`	Legacy LM via `forward_legacy()`	Original `torch.softmax`, no quantisation — decodes exactly as before
`4`	New deterministic path	This PR

Writing: compress(..., use_lm=False) still produces acv=0 raw streams identical to before. compress(..., use_lm=True) now produces acv=4; old decoders will reject acv=4 streams with an unsupported-version error (the version field exists for this purpose).

API surface: no breaking changes. compress, decompress, compress_to_file, decompress_from_file retain the same signatures. The EncodecModel public API is unchanged.

Test results

Benchmarked on 7 stereo 48 kHz music tracks, 10 s clips, encodec_48khz, all 7 tracks decoded without error on every device:

Bandwidth	Device	Avg actual kbps	LM gain vs raw	Encode RTF	Decode RTF
6 kbps	CPU	4.34	27.7%	0.26×	0.27×
6 kbps	MPS	4.34	27.7%	0.33×	0.27×
24 kbps	CPU	19.3	19.9%	0.39×	0.41×
24 kbps	MPS	19.3	19.9%	0.47×	0.40×

CPU and MPS produce byte-identical payloads and identical decoded audio (same kbps, same SNR). Zero decode failures across all tracks, bandwidths, and devices.

Cross-device decode matrix (payloads encoded on Apple Silicon Mac):

Encode	Decode	Before	After
Mac CPU	Linux CPU	`EOFError`	✓
Mac CPU	Linux CUDA	`EOFError`	✓
Mac MPS	Linux CPU	`EOFError`	✓
Mac MPS	Linux CUDA	`EOFError`	✓

This reverts commit f8eda55.

This reverts commit 267e5a8.

meta-cla · 2025-09-08T22:26:27Z

Hi @jbrough!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2025-09-09T12:05:15Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

- Fix _counts_from_pdf negative-count bug (clamp_min before floor after near-integer perturbation); triggered at tau=1.0 with float underflow on zero-probability tokens → non-monotonic CDF → corrupt AC decode - Add acv=4 per-segment CRC chunk framing for blast-radius isolation: corrupt segment → silence substitution, rest of stream intact - Deterministic LM path: float32 weights, float64 softmax via cumsum denominator, logit quantisation to 1/128 grid (LOGIT_QSTEP), integer arithmetic coder; LM always on CPU for cross-platform determinism - Tighter defaults: FP_SCALE=65536, MIN_RANGE=1, LM_TAU=1.0 (~34% gain over raw vs ~29% with tau=2.0) - GPU reliability: model auto-moves wav to device, LM stays on CPU; validated MPS↔CPU↔CUDA cross-device decode - Add legacy decode path (forward_legacy / forward_logits split) for reading acv<3 streams from original Facebook implementation - Add model.get_lm_model(device, dtype) for explicit LM placement - Add scripts/precision_eval.py and scripts/payload_decode_matrix.py for benchmarking, corruption simulation, and cross-host validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Brings all precision and robustness improvements into main: - Fix _counts_from_pdf negative-count bug at tau=1.0 (clamp_min before floor after near-integer perturbation; triggered by float underflow of exp(-large) on zero-probability tokens → non-monotonic CDF → corrupt decode) - acv=4 per-segment CRC chunk framing: corrupt segment → silence, rest intact - Deterministic LM: float32 weights, float64 softmax (cumsum denominator), logit quantisation to 1/128 grid, integer arithmetic coder; LM on CPU always - Defaults: FP_SCALE=65536, MIN_RANGE=1, LM_TAU=1.0, LOGIT_QSTEP=1/128 - tau stored in header so streams are self-describing - GPU reliability: model auto-moves wav to its device, LM stays on CPU; validated MPS↔CPU↔CUDA cross-device decode - Legacy decode path for acv<3 streams from original Facebook implementation - scripts/precision_eval.py and scripts/payload_decode_matrix.py for benchmarking, corruption simulation, and cross-host validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merges original Facebook README with research notes from both the wavey-ai fork and the codex-precision-review branch: deterministic LM path, acv=4 chunk framing, _counts_from_pdf bug fix, GPU reliability, tuned defaults, compression benchmarks, chunk size tradeoffs, and usage examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbrough · 2026-04-22T01:03:56Z

Superseded by a pure Rust + onnx version of Encodec: https://github.com/wavey-ai/encodec-rs

jbrough added 8 commits January 27, 2025 19:54

switch from floating point arithmetic to scaled integers

f8eda55

Revert "switch from floating point arithmetic to scaled integers"

5b7b181

This reverts commit f8eda55.

restrict changes to entropy coding paths

21308e2

quantisation changes

f0267cf

segment boundaries

267e5a8

Revert "segment boundaries"

f05261d

This reverts commit 267e5a8.

quantisation improvements

a485dc6

reinstate fb comments

708087d

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025

partly address performance degradations

2a45ecb

jbrough changed the title ~~WIP (Please collab): Deterministic Entropy Coding Cross-Architecture~~ [WIP]: Deterministic Entropy Coding Cross-Architecture Sep 9, 2025

jbrough and others added 3 commits March 18, 2026 13:16

jbrough changed the title ~~[WIP]: Deterministic Entropy Coding Cross-Architecture~~ Deterministic Entropy Coding Cross-Architecture Mar 18, 2026

jbrough added 12 commits March 19, 2026 14:16

Improve deterministic LM bitstream controls

c84f6cb

Parallelize chunked LM segment encoding

f9da4cf

Tighten cross-host deterministic LM defaults

b00c5bd

Checkpoint native entropy coding and CUDA decode LM

8782578

Speed up CUDA decode LM inference

d3d0776

Document Ada benchmarks and decode tradeoffs

b17da3e

Restructure README for fork-first docs

1301c36

Quantify CPU decode tradeoff in README

ebbb6d1

Default CPU decode workers to auto headroom

c7b089c

Tighten fork README tone

9075227

Add frame-level ONNX export bundle

76995ee

Export ONNX frame bundles with dynamic batch

e5c7ffd

Save local work

4757b5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic Entropy Coding Cross-Architecture #95

Deterministic Entropy Coding Cross-Architecture #95
jbrough wants to merge 25 commits into
facebookresearch:mainfrom
wavey-ai:main

jbrough commented Sep 8, 2025 •

edited

Loading

Uh oh!

meta-cla Bot commented Sep 8, 2025

Uh oh!

meta-cla Bot commented Sep 9, 2025

Uh oh!

jbrough commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jbrough commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deterministic cross-platform LM entropy coding, acv=4 CRC chunk framing, and _counts_from_pdf bug fix

Summary

Motivation

Changes

encodec/compress.py

encodec/model.py

scripts/

Backwards compatibility

Test results

Uh oh!

meta-cla Bot commented Sep 8, 2025

Action Required

Process

Uh oh!

meta-cla Bot commented Sep 9, 2025

Uh oh!

jbrough commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jbrough commented Sep 8, 2025 •

edited

Loading

Deterministic cross-platform LM entropy coding, acv=4 CRC chunk framing, and `_counts_from_pdf` bug fix

`encodec/compress.py`

`encodec/model.py`

`scripts/`