Deterministic Entropy Coding Cross-Architecture #95
Conversation
|
Hi @jbrough! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
- Fix _counts_from_pdf negative-count bug (clamp_min before floor after near-integer perturbation); triggered at tau=1.0 with float underflow on zero-probability tokens → non-monotonic CDF → corrupt AC decode - Add acv=4 per-segment CRC chunk framing for blast-radius isolation: corrupt segment → silence substitution, rest of stream intact - Deterministic LM path: float32 weights, float64 softmax via cumsum denominator, logit quantisation to 1/128 grid (LOGIT_QSTEP), integer arithmetic coder; LM always on CPU for cross-platform determinism - Tighter defaults: FP_SCALE=65536, MIN_RANGE=1, LM_TAU=1.0 (~34% gain over raw vs ~29% with tau=2.0) - GPU reliability: model auto-moves wav to device, LM stays on CPU; validated MPS↔CPU↔CUDA cross-device decode - Add legacy decode path (forward_legacy / forward_logits split) for reading acv<3 streams from original Facebook implementation - Add model.get_lm_model(device, dtype) for explicit LM placement - Add scripts/precision_eval.py and scripts/payload_decode_matrix.py for benchmarking, corruption simulation, and cross-host validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings all precision and robustness improvements into main: - Fix _counts_from_pdf negative-count bug at tau=1.0 (clamp_min before floor after near-integer perturbation; triggered by float underflow of exp(-large) on zero-probability tokens → non-monotonic CDF → corrupt decode) - acv=4 per-segment CRC chunk framing: corrupt segment → silence, rest intact - Deterministic LM: float32 weights, float64 softmax (cumsum denominator), logit quantisation to 1/128 grid, integer arithmetic coder; LM on CPU always - Defaults: FP_SCALE=65536, MIN_RANGE=1, LM_TAU=1.0, LOGIT_QSTEP=1/128 - tau stored in header so streams are self-describing - GPU reliability: model auto-moves wav to its device, LM stays on CPU; validated MPS↔CPU↔CUDA cross-device decode - Legacy decode path for acv<3 streams from original Facebook implementation - scripts/precision_eval.py and scripts/payload_decode_matrix.py for benchmarking, corruption simulation, and cross-host validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merges original Facebook README with research notes from both the wavey-ai fork and the codex-precision-review branch: deterministic LM path, acv=4 chunk framing, _counts_from_pdf bug fix, GPU reliability, tuned defaults, compression benchmarks, chunk size tradeoffs, and usage examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Superseded by a pure Rust + onnx version of Encodec: https://github.com/wavey-ai/encodec-rs |
Deterministic cross-platform LM entropy coding, acv=4 CRC chunk framing, and
_counts_from_pdfbug fixSummary
This PR hardens the LM-backed entropy coding path for cross-platform correctness and adds per-segment failure isolation. The neural network weights and audio quality are unchanged. All existing
.ecdcfiles decode correctly.Motivation
Three problems with the current LM entropy path:
Non-deterministic across hardware.
torch.softmaxcan differ by a ULP between CPU, MPS, and CUDA. The arithmetic coder amplifies these differences — a single wrong probability pushes the decode state off track, producingEOFErroror silent garbage. Payloads encoded on an Apple Silicon Mac reliably fail to decode on Linux CPU or CUDA.Silent corrupt decode at
tau=1.0. In_counts_from_pdf, the near-integer perturbation uses an alternating sign. When a token's probability is exactly0.0(common attau=1.0due to float underflow ofexp(-large)), the negative perturbation givesx = -ε, thenfloor(-ε) = -1. A negative count makes the CDF non-monotonic; the decoder produces wrong symbols with no error raised.No failure isolation. A single corrupt byte anywhere in the payload desynchronises the arithmetic decoder and destroys the rest of the file.
Changes
encodec/compress.pyDeterministic CDF construction
_stable_softmax: computes softmax in float64 using a sequential cumsum denominator rather thantorch.softmax. Cross-architecture bit-reproducibility verified Mac CPU/MPS → Linux CPU/CUDA._quantize_logits_: rounds logits to a 1/128 grid before softmax. Tiny floating-point differences that don't change the quantised logit produce identical CDFs._counts_from_pdf: addsclamp_min(0)after the near-integer perturbation step, fixing the negative-count bug attau=1.0._deterministic_cdf/_deterministic_cdf_multi: integer floor + priority allocation CDF construction atFP_SCALE=65536precision. Replaces float-based CDF that was sensitive to platform differences.Bitstream version
acv=4with CRC chunk framing[chunk_len: u32 BE][crc32: u32 BE][payload].tauis stored in the header so encoder and decoder are always in sync without out-of-band configuration.GPU reliability
compress_to_filedetects the model device and moves the waveform there automatically (wav[None].to(model_device)). Previously crashed when the model was on MPS or CUDA.Tunable defaults (via env vars; existing behaviour unchanged if not set):
ENCODEC_LM_TAU1.0ENCODEC_LOGIT_QSTEP1/128ENCODEC_AC_FP_SCALE65536ENCODEC_AC_MIN_RANGE1ENCODEC_DETERMINISTIC_LM_DTYPEfloat32encodec/model.pyLMModel.forward_logits: factored out fromforwardso the deterministic and legacy paths share the transformer forward pass.LMModel.forward_legacy: raw softmax with no quantisation, used for decodingacv < 3streams.LMModel.__init__: acceptstauparameter.EncodecModel.get_lm_model: acceptsdeviceanddtypeparameters for explicit LM placement.scripts/precision_eval.py: CLI for benchmarking bitrate, SNR, encode/decode wall time, CPU vs MPS, LM vs non-LM, and single-byte corruption behaviour (targets chunk bodies, not headers/CRC).payload_decode_matrix.py: decodes a payload across CPU and CUDA and compares results; intended for cross-host determinism validation.Backwards compatibility
Reading old streams: fully preserved. The decoder reads the
acvfield from the stream header and routes accordingly:acv01/2forward_legacy()torch.softmax, no quantisation — decodes exactly as before4Writing:
compress(..., use_lm=False)still producesacv=0raw streams identical to before.compress(..., use_lm=True)now producesacv=4; old decoders will rejectacv=4streams with an unsupported-version error (the version field exists for this purpose).API surface: no breaking changes.
compress,decompress,compress_to_file,decompress_from_fileretain the same signatures. TheEncodecModelpublic API is unchanged.Test results
Benchmarked on 7 stereo 48 kHz music tracks, 10 s clips,
encodec_48khz, all 7 tracks decoded without error on every device:CPU and MPS produce byte-identical payloads and identical decoded audio (same kbps, same SNR). Zero decode failures across all tracks, bandwidths, and devices.
Cross-device decode matrix (payloads encoded on Apple Silicon Mac):
EOFErrorEOFErrorEOFErrorEOFError