From 0619c0a981264f01fa0c5072b06301f17b6bd80c Mon Sep 17 00:00:00 2001 From: Oli Larkin Date: Thu, 21 May 2026 23:45:45 +0200 Subject: [PATCH 1/3] Trim duplicated JS API docs from root README The npm/README.md already covers the full JS/WASM API. Replace the parallel section in the root README with a pointer to it. --- README.md | 72 ++----------------------------------------------------- 1 file changed, 2 insertions(+), 70 deletions(-) diff --git a/README.md b/README.md index 7271989..2d19007 100644 --- a/README.md +++ b/README.md @@ -159,77 +159,9 @@ int width = paulstretch::fft_simd_size(); // 4 ## Node.js / WASM usage -```js -import createPaulstretchModule from "@olilarkin/paulstretch-wasm"; +The Emscripten build is published as [`@olilarkin/paulstretch-wasm`](https://github.com/olilarkin/libpaulstretch/pkgs/npm/paulstretch-wasm). See [`npm/README.md`](npm/README.md) for installation, the full JS API (offline, streaming, envelope, spectral processing, binaural beats), and bundler notes. Type definitions live in [`npm/index.d.ts`](npm/index.d.ts). -const Module = await createPaulstretchModule(); -const renderer = new Module.OfflineRenderer(8.0, 4096, 48000, Module.Window.Hann, 0.0); - -const output = renderer.renderMono(input); -const { left, right } = renderer.renderStereo(leftChannel, rightChannel); - -renderer.delete(); // embind objects are not GC'd -``` - -### Streaming - -```js -const s = new Module.StreamingStretcher(8.0, 4096, 48000, Module.Window.Hann, 0.0); - -while (rendering) { - const want = s.nextInputSize(); - const input = gatherFrames(want); // Float32Array, zero-pad if needed - const { output, onset } = s.step(input, positionPct); - writeFrames(output); - inputCursor += want + s.skipAfterStep(); -} -s.delete(); -``` - -For multichannel hosts that need synchronized onsets across channels, use `stepWithoutOnsetFeedback()` on every channel, take the max onset, then call `applyOnset()` on every channel before the next iteration. - -### Stretch envelope - -Pass parallel arrays of positions (0–1) and multiplier values: - -```js -renderer.setStretchEnvelope( - new Float32Array([0, 0.5, 1.0]), - new Float32Array([1.0, 4.0, 1.0]), -); -const output = renderer.renderMono(input); -renderer.clearStretchEnvelope(); -``` - -### Spectral processing - -`setProcessOptions` accepts a plain JS object with camelCase keys (e.g. `pitchShiftEnabled`, `pitchShiftCents`, `filterEnabled`, `filterLowHz`); unspecified fields keep their defaults: - -```js -renderer.setProcessOptions({ - pitchShiftEnabled: true, - pitchShiftCents: 700, - filterEnabled: true, - filterLowHz: 200, - filterHighHz: 4000, -}); -``` - -See `npm/index.d.ts` for the full `ProcessOptions` shape. - -### Binaural beats - -```js -const bb = new Module.BinauralBeatsProcessor(48000); -bb.setOptions({ - enabled: true, - stereoMode: Module.BinauralStereoMode.LeftRight, - mono: 0.5, - beatFrequencyHz: 8, -}); -const { left, right } = bb.process(leftIn, rightIn, positionPct); -bb.delete(); -``` +The C++ and JS APIs are 1:1 — JS methods use camelCase versions of the C++ names, and `setProcessOptions` accepts a plain object instead of a `ProcessOptions` struct. ## Notes From 06f61c2d145e28456af68c9f0cecd34934d3d96f Mon Sep 17 00:00:00 2001 From: Oli Larkin Date: Sun, 7 Jun 2026 12:52:12 +0200 Subject: [PATCH 2/3] Add chunked offline render to bound WASM memory on long outputs renderMono/renderStereo materialise the entire output as one contiguous vector in WASM linear memory and then copy it again into the returned Float32Array, so peak memory is ~2x the output (and both channels at once for stereo). A large stretch (e.g. a few seconds stretched several hundred times into an hour-plus of audio) exceeds the wasm32 heap and aborts. The module was also linked with ALLOW_MEMORY_GROWTH but no MAXIMUM_MEMORY, so the cap was Emscripten's 2 GiB default. - Add OfflineRenderer::render_mono_chunked / render_stereo_chunked, which run the same DSP loop (refactored into shared stream_channel / stream_stereo cores) but deliver the result one bufsize() chunk at a time. render_mono/render_stereo output is unchanged. - Expose renderMonoChunked / renderStereoChunked in the WASM bindings; each chunk is copied out to a fresh JS-heap Float32Array, so peak WASM memory stays ~ input + one chunk regardless of output length. - Raise MAXIMUM_MEMORY to the 4 GiB wasm32 maximum for immediate headroom. - Add TypeScript declarations and an npm README section. - Add tests/chunked_test.cpp: structural parity between the chunked and whole-buffer paths, plus an allocation-free check that a representative extreme render overflows the known wasm32 caps while the chunked working set stays tiny. Extend the browser smoke test to exercise renderMonoChunked. --- CLAUDE.md | 2 +- CMakeLists.txt | 9 ++ README.md | 13 ++ include/paulstretch/paulstretch.h | 17 +++ npm/README.md | 23 +++ npm/index.d.ts | 17 +++ src/paulstretch.cpp | 168 +++++++++++++-------- src/wasm_bindings.cpp | 42 +++++- tests/browser/index.html | 25 +++ tests/chunked_test.cpp | 243 ++++++++++++++++++++++++++++++ 10 files changed, 495 insertions(+), 64 deletions(-) create mode 100644 tests/chunked_test.cpp diff --git a/CLAUDE.md b/CLAUDE.md index 1370347..06c8179 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -53,7 +53,7 @@ FFT-based extreme time-stretching library (Paulstretch algorithm). Single transl ### Public classes - **`StreamingStretcher`** — Block-based push/pull primitive for realtime use (AudioWorklet / Web Worker). The host calls `next_input_size()` to learn how many input frames `step()` wants next, gathers exactly that many (zero-padding if the source ran out), calls `step()` to produce `bufsize()` output frames, then advances its input cursor by an additional `skip_after_step()` frames. The first `step()` call expects `max_input_chunk()` (= 3 × bufsize) frames for the initial fill; subsequent requests alternate between 0 and `bufsize()` depending on stretch factor and onset detection. `step_without_onset_feedback()` + `apply_onset()` exists for hosts that need to coordinate onsets across channels. -- **`OfflineRenderer`** — Convenience wrapper around `StreamingStretcher` for whole-buffer rendering (`render_mono`, `render_stereo`). Stereo rendering runs two independent `StreamingStretcher`s but synchronizes onset detection (`max` of both channels) so the channels stay aligned. +- **`OfflineRenderer`** — Convenience wrapper around `StreamingStretcher` for whole-buffer rendering (`render_mono`, `render_stereo`). Stereo rendering runs two independent `StreamingStretcher`s but synchronizes onset detection (`max` of both channels) so the channels stay aligned. `render_mono_chunked` / `render_stereo_chunked` run the same DSP loop but deliver the output one `bufsize()`-frame chunk at a time via a `ChunkSink` callback, so peak memory stays bounded for very long outputs (the buffered path holds the whole result — and on WASM a second JS-side copy of it — in linear memory at once, which can exceed the heap and abort). The buffered and chunked paths share a common `stream_channel` / `stream_stereo` core in `src/paulstretch.cpp`. - **`BinauralBeatsProcessor`** — Independent post-processor that mixes the stretched signal toward mono and adds a sub-audio LFO-style beat between L/R (`set_options`, optional frequency envelope, `process(left, right, nframes, position_pct)`). ### Key data flow diff --git a/CMakeLists.txt b/CMakeLists.txt index 6c83c82..dc1c6ab 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -101,6 +101,10 @@ if (PAULSTRETCH_BUILD_TESTS AND NOT EMSCRIPTEN) COMMAND paulstretch_streaming_test ${CMAKE_CURRENT_SOURCE_DIR}/tests/plenty_of_unknown.wav ) + + add_executable(paulstretch_chunked_test tests/chunked_test.cpp) + target_link_libraries(paulstretch_chunked_test PRIVATE paulstretch_core) + add_test(NAME paulstretch_chunked_test COMMAND paulstretch_chunked_test) endif() if (PAULSTRETCH_BUILD_EXAMPLES AND NOT EMSCRIPTEN) @@ -125,6 +129,11 @@ if (EMSCRIPTEN AND PAULSTRETCH_BUILD_WASM) -O3 --bind -sALLOW_MEMORY_GROWTH=1 + # Raise the growth ceiling to the wasm32 maximum (4 GiB). Emscripten + # defaults to 2 GiB, which an hour-plus offline render can exceed and + # abort. For unbounded output lengths, prefer renderMonoChunked / + # renderStereoChunked, which keep linear memory bounded regardless. + -sMAXIMUM_MEMORY=4294967296 # Default ENVIRONMENT (web,webview,worker,node) — works in browsers, # bundlers (Vite/Webpack), Node, and Web Workers. -sEXPORT_ES6=1 diff --git a/README.md b/README.md index 2d19007..05a9329 100644 --- a/README.md +++ b/README.md @@ -66,6 +66,19 @@ auto [left, right] = renderer.render_stereo(left_in, right_in); Stereo rendering runs two independent stretchers but synchronizes onset detection across channels so they stay phase-aligned. +#### Very long outputs (chunked rendering) + +`render_mono` returns the whole result in one `std::vector`. For an extreme stretch — a few seconds blown up several hundred times into an hour-plus of audio — that buffer can be enormous, and on the WebAssembly build it has to live in linear memory twice over (the C++ vector plus the returned `Float32Array`), which can exceed the WASM heap and abort. `render_mono_chunked` / `render_stereo_chunked` run the identical algorithm but hand each `bufsize()`-frame chunk to a sink as it is produced, so peak memory stays bounded regardless of output length: + +```cpp +renderer.render_mono_chunked(input, [&](const float *data, int frames) { + // Consume the chunk — append to a buffer, write to disk, feed an encoder. +}); + +renderer.render_stereo_chunked(left_in, right_in, + [&](const float *left, const float *right, int frames) { /* ... */ }); +``` + ### Streaming (realtime) rendering `StreamingStretcher` is a block-based push/pull primitive for realtime hosts (audio callback, AudioWorklet, Web Worker). The host gathers exactly the number of input frames the stretcher asks for, calls `step()` to produce one output chunk, then advances its input cursor by the additional skip distance: diff --git a/include/paulstretch/paulstretch.h b/include/paulstretch/paulstretch.h index cf716cf..ae553d5 100644 --- a/include/paulstretch/paulstretch.h +++ b/include/paulstretch/paulstretch.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include #include @@ -190,6 +191,22 @@ class OfflineRenderer { StereoBuffer render_stereo(const std::vector &left, const std::vector &right) const; + // Chunked offline rendering. Identical algorithm to render_mono/render_stereo, + // but instead of materialising the whole output in one buffer the result is + // delivered to `sink` one chunk at a time (each chunk is `bufsize()` frames). + // This keeps peak memory bounded regardless of stretch factor / output length + // — essential for the WASM build, whose linear memory is capped well below the + // size an hour-plus render would otherwise need. The pointers passed to `sink` + // are only valid for the duration of the call; copy out anything you keep. + using ChunkSink = std::function; + using StereoChunkSink = + std::function; + void render_mono_chunked(const std::vector &input, + const ChunkSink &sink) const; + void render_stereo_chunked(const std::vector &left, + const std::vector &right, + const StereoChunkSink &sink) const; + std::size_t estimate_output_frames(std::size_t input_frames) const; private: diff --git a/npm/README.md b/npm/README.md index cdb7c8b..f86b402 100644 --- a/npm/README.md +++ b/npm/README.md @@ -54,6 +54,29 @@ renderer.delete(); const { left, right } = renderer.renderStereo(leftIn, rightIn); ``` +### Very long outputs (chunked rendering) + +`renderMono` returns the whole result in one `Float32Array`, which has to live in +WebAssembly memory twice over (the internal buffer plus the returned copy). A large +stretch — e.g. a few seconds stretched several hundred times into an hour-plus of +audio — can exceed the WASM heap and abort. For those cases use `renderMonoChunked` +(or `renderStereoChunked`): same algorithm, but the output is delivered one chunk at +a time so peak WASM memory stays bounded regardless of length. Accumulate the chunks +on the JS heap, or stream them straight to disk / an encoder. + +```js +const chunks = []; +const totalFrames = renderer.renderMonoChunked(input, (chunk) => { + // `chunk` is a fresh Float32Array you may keep (~fftSize frames). + chunks.push(chunk); +}); + +// Stereo: callback receives (left, right) per chunk. +// const totalFrames = renderer.renderStereoChunked(leftIn, rightIn, (l, r) => { ... }); + +renderer.delete(); +``` + ### Time-varying stretch (breakpoint envelope) Positions are normalized `0..1` over the input. Values multiply the `stretch` you passed to the constructor. diff --git a/npm/index.d.ts b/npm/index.d.ts index a2b7d8e..4813f62 100644 --- a/npm/index.d.ts +++ b/npm/index.d.ts @@ -68,6 +68,23 @@ export interface OfflineRenderer { renderMono(input: Float32Array): Float32Array; renderStereo(left: Float32Array, right: Float32Array): StereoBuffer; + /** + * Chunked offline render — use this instead of `renderMono` for very long + * outputs (large stretch factors). The whole result of `renderMono` must live + * in WASM linear memory twice over (the C++ buffer plus the returned copy), + * so an hour-plus render can exceed the heap cap and abort. `renderMonoChunked` + * instead invokes `onChunk` with each ~`bufsize` slice; copy/accumulate it on + * the JS heap (or stream it to disk / an encoder) as it arrives. Peak WASM + * memory stays bounded regardless of output length. The Float32Array passed to + * `onChunk` is a fresh JS-heap copy you may keep. Returns the total frame count. + */ + renderMonoChunked(input: Float32Array, onChunk: (chunk: Float32Array) => void): number; + renderStereoChunked( + left: Float32Array, + right: Float32Array, + onChunk: (left: Float32Array, right: Float32Array) => void, + ): number; + /** * Set a time-varying stretch multiplier. Positions are normalized 0..1 * over the input duration; values are multipliers on the constructor's diff --git a/src/paulstretch.cpp b/src/paulstretch.cpp index 0b22b0a..5b5cc5f 100644 --- a/src/paulstretch.cpp +++ b/src/paulstretch.cpp @@ -991,13 +991,17 @@ namespace { // ── Offline render glue (built on StreamingStretcher) ────────────────────── -std::vector render_channel( +// Core streaming loop: drives a StreamingStretcher over `input` and delivers +// each `bufsize` output chunk to `sink` (never materialising the full output). +// Both render_channel (whole-buffer) and render_mono_chunked build on this. +void stream_channel( const std::vector &input, const RenderOptions &options, const std::vector &envelope, const ProcessOptions &process_options, - const std::vector &arbitrary_filter) { - if (input.empty()) return {}; + const std::vector &arbitrary_filter, + const OfflineRenderer::ChunkSink &sink) { + if (input.empty()) return; StreamingStretcher stretch(options); if (!envelope.empty()) stretch.set_stretch_envelope(envelope); @@ -1007,8 +1011,6 @@ std::vector render_channel( const int bufsize = stretch.bufsize(); std::vector out_chunk(bufsize, 0.0f); std::vector in_chunk(stretch.max_input_chunk(), 0.0f); - std::vector output; - output.reserve(static_cast(std::ceil(input.size() * options.stretch)) + bufsize); std::size_t cursor = 0; bool first = true; @@ -1030,14 +1032,98 @@ std::vector render_channel( } stretch.step(want > 0 ? in_chunk.data() : nullptr, pos_pct, out_chunk.data()); - output.insert(output.end(), out_chunk.begin(), out_chunk.end()); + sink(out_chunk.data(), bufsize); cursor = clamp_advance(cursor, stretch.skip_after_step(), input.size()); first = false; } +} + +std::vector render_channel( + const std::vector &input, + const RenderOptions &options, + const std::vector &envelope, + const ProcessOptions &process_options, + const std::vector &arbitrary_filter) { + if (input.empty()) return {}; + + std::vector output; + output.reserve(static_cast(std::ceil(input.size() * options.stretch)) + options.fft_size); + + stream_channel(input, options, envelope, process_options, arbitrary_filter, + [&output](const float *data, int frames) { + output.insert(output.end(), data, data + frames); + }); return output; } +// Core streaming loop for stereo. Runs two StreamingStretchers in lockstep and +// delivers each `bufsize` L/R chunk pair to `sink`. Both render_stereo and +// render_stereo_chunked build on this. +void stream_stereo( + const std::vector &left, + const std::vector &right, + const RenderOptions &options, + const std::vector &envelope, + const ProcessOptions &process_options, + const std::vector &arbitrary_filter, + const OfflineRenderer::StereoChunkSink &sink) { + StreamingStretcher stretch_left(options); + StreamingStretcher stretch_right(options); + if (!envelope.empty()) { + stretch_left.set_stretch_envelope(envelope); + stretch_right.set_stretch_envelope(envelope); + } + stretch_left.set_process_options(process_options); + stretch_right.set_process_options(process_options); + if (!arbitrary_filter.empty()) { + stretch_left.set_arbitrary_filter(arbitrary_filter); + stretch_right.set_arbitrary_filter(arbitrary_filter); + } + + const int bufsize = stretch_left.bufsize(); + std::vector in_l(stretch_left.max_input_chunk(), 0.0f); + std::vector in_r(stretch_right.max_input_chunk(), 0.0f); + std::vector out_l(bufsize, 0.0f); + std::vector out_r(bufsize, 0.0f); + + bool first = true; + std::size_t cursor = 0; + + while (true) { + const float pos_pct = 100.0f * static_cast(cursor) / static_cast(left.size()); + const int want = first ? stretch_left.max_input_chunk() : stretch_left.next_input_size(); + + if (want > 0 && cursor >= left.size() && !first) break; + + if (want > 0) { + const std::size_t avail = left.size() - cursor; + const std::size_t take = std::min(avail, static_cast(want)); + std::copy_n(left.data() + cursor, take, in_l.data()); + std::copy_n(right.data() + cursor, take, in_r.data()); + if (take < static_cast(want)) { + std::fill_n(in_l.data() + take, want - take, 0.0f); + std::fill_n(in_r.data() + take, want - take, 0.0f); + } + cursor += take; + } + + // NB: we drop the per-channel onset coordination the previous offline + // path did (combining onset_l/onset_r with std::max and feeding both + // sides) because the StreamingStretcher's step() applies its own onset + // internally. For independent stereo channels with onset detection on, + // this means each side reacts to its own transients — usually the more + // correct behavior anyway. + stretch_left.step(want > 0 ? in_l.data() : nullptr, pos_pct, out_l.data()); + stretch_right.step(want > 0 ? in_r.data() : nullptr, pos_pct, out_r.data()); + + sink(out_l.data(), out_r.data(), bufsize); + + cursor = clamp_advance(cursor, stretch_left.skip_after_step(), left.size()); + first = false; + } +} + } // anonymous namespace // ── Public API ───────────────────────────────────────────────────────────── @@ -1298,68 +1384,30 @@ StereoBuffer OfflineRenderer::render_stereo(const std::vector &left, cons if (left.size() != right.size()) throw std::invalid_argument("left and right channel lengths must match"); if (left.empty()) return {}; - StreamingStretcher stretch_left(options_); - StreamingStretcher stretch_right(options_); - if (!envelope_.empty()) { - stretch_left.set_stretch_envelope(envelope_); - stretch_right.set_stretch_envelope(envelope_); - } - stretch_left.set_process_options(process_options_); - stretch_right.set_process_options(process_options_); - if (!arbitrary_filter_.empty()) { - stretch_left.set_arbitrary_filter(arbitrary_filter_); - stretch_right.set_arbitrary_filter(arbitrary_filter_); - } - - const int bufsize = stretch_left.bufsize(); - std::vector in_l(stretch_left.max_input_chunk(), 0.0f); - std::vector in_r(stretch_right.max_input_chunk(), 0.0f); - std::vector out_l(bufsize, 0.0f); - std::vector out_r(bufsize, 0.0f); - StereoBuffer output; const std::size_t reserve = estimate_output_frames(left.size()); output.left.reserve(reserve); output.right.reserve(reserve); - bool first = true; - std::size_t cursor = 0; - - while (true) { - const float pos_pct = 100.0f * static_cast(cursor) / static_cast(left.size()); - const int want = first ? stretch_left.max_input_chunk() : stretch_left.next_input_size(); + stream_stereo(left, right, options_, envelope_, process_options_, arbitrary_filter_, + [&output](const float *l, const float *r, int frames) { + output.left.insert(output.left.end(), l, l + frames); + output.right.insert(output.right.end(), r, r + frames); + }); - if (want > 0 && cursor >= left.size() && !first) break; - - if (want > 0) { - const std::size_t avail = left.size() - cursor; - const std::size_t take = std::min(avail, static_cast(want)); - std::copy_n(left.data() + cursor, take, in_l.data()); - std::copy_n(right.data() + cursor, take, in_r.data()); - if (take < static_cast(want)) { - std::fill_n(in_l.data() + take, want - take, 0.0f); - std::fill_n(in_r.data() + take, want - take, 0.0f); - } - cursor += take; - } - - // NB: we drop the per-channel onset coordination the previous offline - // path did (combining onset_l/onset_r with std::max and feeding both - // sides) because the StreamingStretcher's step() applies its own onset - // internally. For independent stereo channels with onset detection on, - // this means each side reacts to its own transients — usually the more - // correct behavior anyway. - stretch_left.step(want > 0 ? in_l.data() : nullptr, pos_pct, out_l.data()); - stretch_right.step(want > 0 ? in_r.data() : nullptr, pos_pct, out_r.data()); - - output.left.insert(output.left.end(), out_l.begin(), out_l.end()); - output.right.insert(output.right.end(), out_r.begin(), out_r.end()); + return output; +} - cursor = clamp_advance(cursor, stretch_left.skip_after_step(), left.size()); - first = false; - } +void OfflineRenderer::render_mono_chunked(const std::vector &input, const ChunkSink &sink) const { + stream_channel(input, options_, envelope_, process_options_, arbitrary_filter_, sink); +} - return output; +void OfflineRenderer::render_stereo_chunked(const std::vector &left, + const std::vector &right, + const StereoChunkSink &sink) const { + if (left.size() != right.size()) throw std::invalid_argument("left and right channel lengths must match"); + if (left.empty()) return; + stream_stereo(left, right, options_, envelope_, process_options_, arbitrary_filter_, sink); } std::size_t OfflineRenderer::estimate_output_frames(std::size_t input_frames) const { diff --git a/src/wasm_bindings.cpp b/src/wasm_bindings.cpp index ee49034..0b29e8c 100644 --- a/src/wasm_bindings.cpp +++ b/src/wasm_bindings.cpp @@ -17,12 +17,16 @@ std::vector from_js_array(const emscripten::val &input) { return samples; } -emscripten::val to_js_float32_array(const std::vector &input) { - emscripten::val output = emscripten::val::global("Float32Array").new_(input.size()); - output.call("set", emscripten::val(emscripten::typed_memory_view(input.size(), input.data()))); +emscripten::val to_js_float32_array(const float *data, std::size_t n) { + emscripten::val output = emscripten::val::global("Float32Array").new_(n); + output.call("set", emscripten::val(emscripten::typed_memory_view(n, data))); return output; } +emscripten::val to_js_float32_array(const std::vector &input) { + return to_js_float32_array(input.data(), input.size()); +} + template T get_or(const emscripten::val &obj, const char *key, T fallback) { emscripten::val v = obj[key]; @@ -115,6 +119,36 @@ class WasmOfflineRenderer { return result; } + // Chunked offline render. Instead of returning one huge Float32Array (which + // for an hour-plus output would need to live in WASM linear memory twice over + // — the C++ vector plus the JS copy — and can blow the memory cap and abort), + // each ~bufsize() chunk is copied out to a fresh JS-heap Float32Array and + // handed to `onChunk`. Peak WASM memory stays ≈ input + one chunk; the full + // output accumulates on the JS side (or is streamed to disk / an encoder). + // Returns the total number of output frames delivered. + std::size_t renderMonoChunked(const emscripten::val &input, emscripten::val onChunk) const { + const std::vector in = from_js_array(input); + std::size_t total = 0; + renderer_.render_mono_chunked(in, [&onChunk, &total](const float *data, int frames) { + onChunk(to_js_float32_array(data, static_cast(frames))); + total += static_cast(frames); + }); + return total; + } + + std::size_t renderStereoChunked(const emscripten::val &left, const emscripten::val &right, + emscripten::val onChunk) const { + const std::vector l = from_js_array(left); + const std::vector r = from_js_array(right); + std::size_t total = 0; + renderer_.render_stereo_chunked(l, r, [&onChunk, &total](const float *lc, const float *rc, int frames) { + onChunk(to_js_float32_array(lc, static_cast(frames)), + to_js_float32_array(rc, static_cast(frames))); + total += static_cast(frames); + }); + return total; + } + void setStretchEnvelope(const emscripten::val &xs, const emscripten::val &ys) { const int n = std::min(xs["length"].as(), ys["length"].as()); std::vector envelope(n); @@ -301,6 +335,8 @@ EMSCRIPTEN_BINDINGS(paulstretch) { .function("estimateOutputFrames", &WasmOfflineRenderer::estimateOutputFrames) .function("renderMono", &WasmOfflineRenderer::renderMono) .function("renderStereo", &WasmOfflineRenderer::renderStereo) + .function("renderMonoChunked", &WasmOfflineRenderer::renderMonoChunked) + .function("renderStereoChunked", &WasmOfflineRenderer::renderStereoChunked) .function("setStretchEnvelope", &WasmOfflineRenderer::setStretchEnvelope) .function("clearStretchEnvelope", &WasmOfflineRenderer::clearStretchEnvelope) .function("setProcessOptions", &WasmOfflineRenderer::setProcessOptions) diff --git a/tests/browser/index.html b/tests/browser/index.html index 6756e8c..3aa8335 100644 --- a/tests/browser/index.html +++ b/tests/browser/index.html @@ -41,6 +41,31 @@ if (env.length === flat.length) throw new Error('envelope did not change length'); r.delete(); + + // Chunked render: same algorithm as renderMono, delivered incrementally so + // huge outputs never need the whole buffer in WASM memory at once. Accumulate + // the chunks JS-side. Per-instance PRNG seed means the samples won't be + // bit-identical to `flat`, but the total length must match and the callback + // must fire repeatedly with finite data (proving the bounded-memory path). + const rc = new M.OfflineRenderer(4, 1024, 44100, M.Window.Hann, 0); + let chunkCount = 0; + let chunkFrames = 0; + let allFinite = true; + const total = rc.renderMonoChunked(input, (chunk) => { + chunkCount++; + chunkFrames += chunk.length; + for (let i = 0; i < chunk.length; i++) { + if (!Number.isFinite(chunk[i])) allFinite = false; + } + }); + log('chunked 4x: output', total, 'in', chunkCount, 'chunks'); + if (chunkCount < 2) throw new Error('expected multiple chunks, got ' + chunkCount); + if (!allFinite) throw new Error('chunked output had non-finite samples'); + if (total !== chunkFrames || total !== flat.length) { + throw new Error('chunked length ' + total + ' != flat length ' + flat.length); + } + rc.delete(); + out.dataset.status = 'ok'; } catch (e) { fail((e && e.message) + '\n' + (e && e.stack)); diff --git a/tests/chunked_test.cpp b/tests/chunked_test.cpp new file mode 100644 index 0000000..aa76cc0 --- /dev/null +++ b/tests/chunked_test.cpp @@ -0,0 +1,243 @@ +// Tests for the chunked offline render API (render_mono_chunked / +// render_stereo_chunked). +// +// These exist so the WebAssembly build can produce very long outputs without +// materialising the whole result in linear memory. renderMono returns the +// output in one buffer that must live in WASM memory twice over (the internal +// vector plus the returned copy), so a large render can exceed the wasm32 heap +// and abort. The chunked variants run the same DSP loop but deliver the result +// one bufsize() chunk at a time, keeping peak memory bounded. +// +// Part 1 checks behavioural parity with the whole-buffer path at a modest size. +// Part 2 checks the thing that actually matters — that a representative extreme +// render would blow the known wasm32 memory limits if buffered, while the +// chunked path's working set stays a tiny fraction of them. (The native host is +// 64-bit and can't reproduce the abort, so Part 2 reasons about the sizes via +// estimate_output_frames rather than allocating them.) +// +// We can't assert bit-identity against render_mono: each Stretcher instance +// bumps a static PRNG seed, so two renders get different phase randomization +// (same convention as streaming_test.cpp). Hence structural comparison. + +#include +#include +#include +#include +#include +#include + +#include "paulstretch/paulstretch.h" + +namespace { + +int failures = 0; + +void check(bool cond, const std::string &label) { + if (!cond) { + ++failures; + std::cerr << "FAIL: " << label << '\n'; + } else { + std::cout << " ok: " << label << '\n'; + } +} + +std::vector make_sine(std::size_t frames, float sample_rate, float frequency) { + std::vector result(frames, 0.0f); + const float tau = 6.28318530717958647692f; + for (std::size_t i = 0; i < frames; i++) { + float phase = tau * frequency * static_cast(i) / sample_rate; + result[i] = 0.5f * std::sin(phase); + } + return result; +} + +struct Stats { + std::size_t frames = 0; + float peak = 0.0f; + double rms = 0.0; + bool all_finite = true; +}; + +Stats analyze(const std::vector &v) { + Stats s; + s.frames = v.size(); + double sumsq = 0.0; + for (float x : v) { + if (!std::isfinite(x)) { s.all_finite = false; continue; } + float a = std::fabs(x); + if (a > s.peak) s.peak = a; + sumsq += static_cast(x) * static_cast(x); + } + s.rms = v.empty() ? 0.0 : std::sqrt(sumsq / static_cast(v.size())); + return s; +} + +// Same contract streaming_test uses: equal length, finite, peak within 2×, RMS +// within 15% (phase randomness differs but averages out). +void check_structural(const std::vector &a, const std::vector &b, + const std::string &label) { + auto sa = analyze(a); + auto sb = analyze(b); + check(sa.all_finite, label + ": chunked finite"); + check(sb.all_finite, label + ": reference finite"); + check(a.size() == b.size(), label + ": same length"); + if (sa.peak > 0 && sb.peak > 0) { + const double ratio = sa.peak / sb.peak; + check(ratio > 0.5 && ratio < 2.0, label + ": peaks within 2×"); + } + if (sa.rms > 0 && sb.rms > 0) { + const double rel = std::fabs(sa.rms - sb.rms) / std::max(sa.rms, sb.rms); + check(rel < 0.15, label + ": RMS within 15%"); + } +} + +double gib(std::uint64_t bytes) { return static_cast(bytes) / (1024.0 * 1024.0 * 1024.0); } + +// ── Part 1: behavioural parity at a modest, CI-fast size ─────────────────── +void test_parity() { + std::cout << "\n[parity] chunked vs whole-buffer render\n"; + + const paulstretch::RenderOptions options{ + .stretch = 32.0f, + .fft_size = 4096, + .sample_rate = 44100.0f, + .window = paulstretch::Window::Hann, + .onset_detection_sensitivity = 0.0f, + }; + + paulstretch::OfflineRenderer renderer(options); + const int bufsize = static_cast(renderer.options().fft_size); + + const auto left_in = make_sine(44100, options.sample_rate, 220.0f); // 1 s + const auto right_in = make_sine(44100, options.sample_rate, 277.0f); + + // Mono. + const auto mono_ref = renderer.render_mono(left_in); + std::vector mono_chunked; + mono_chunked.reserve(mono_ref.size()); + std::size_t mono_chunks = 0; + bool mono_uniform = true; + renderer.render_mono_chunked(left_in, [&](const float *data, int frames) { + mono_chunks++; + if (frames != bufsize) mono_uniform = false; + mono_chunked.insert(mono_chunked.end(), data, data + frames); + }); + std::cout << " mono: " << mono_ref.size() << " frames in " << mono_chunks << " chunks\n"; + check(!mono_ref.empty(), "render_mono produced output"); + check(mono_uniform, "every mono chunk is exactly bufsize frames"); + check(mono_chunks > 100, "many mono chunks (per-chunk memory is bounded)"); + check_structural(mono_chunked, mono_ref, "mono"); + + // Stereo. + const auto stereo_ref = renderer.render_stereo(left_in, right_in); + std::vector stereo_l, stereo_r; + stereo_l.reserve(stereo_ref.left.size()); + stereo_r.reserve(stereo_ref.right.size()); + bool stereo_uniform = true; + renderer.render_stereo_chunked(left_in, right_in, [&](const float *l, const float *r, int frames) { + if (frames != bufsize) stereo_uniform = false; + stereo_l.insert(stereo_l.end(), l, l + frames); + stereo_r.insert(stereo_r.end(), r, r + frames); + }); + check(!stereo_ref.left.empty(), "render_stereo produced output"); + check(stereo_uniform, "every stereo chunk is exactly bufsize frames"); + check_structural(stereo_l, stereo_ref.left, "stereo L"); + check_structural(stereo_r, stereo_ref.right, "stereo R"); + + // Empty input is a no-op (sink never fires). + std::size_t empty_calls = 0; + renderer.render_mono_chunked({}, [&](const float *, int) { empty_calls++; }); + check(empty_calls == 0, "chunked render of empty input does not call the sink"); +} + +// ── Part 2: the chunked path stays under the wasm32 memory limits ────────── +void test_memory_bounds() { + std::cout << "\n[limits] buffered footprint vs wasm32 heap caps\n"; + + // wasm32 linear-memory ceilings the WASM build runs against. + constexpr std::uint64_t kFloat = sizeof(float); + constexpr std::uint64_t kWasmDefaultCap = 2ull * 1024 * 1024 * 1024; // Emscripten default MAXIMUM_MEMORY + constexpr std::uint64_t kWasmMaxHeap = 4ull * 1024 * 1024 * 1024; // wasm32 hard maximum + + // A representative extreme job: a 15-second clip stretched 500× (~2 hours of + // output). Sizes come from estimate_output_frames — nothing is allocated. + const paulstretch::RenderOptions options{ + .stretch = 500.0f, + .fft_size = 4096, + .sample_rate = 44100.0f, + .window = paulstretch::Window::Hann, + .onset_detection_sensitivity = 0.0f, + }; + paulstretch::OfflineRenderer renderer(options); + + const std::size_t input_frames = static_cast(15 * 44100); + const std::uint64_t out_frames = renderer.estimate_output_frames(input_frames); + const std::uint64_t per_channel = out_frames * kFloat; + + // What renderMono / renderStereo actually need live at once: + // mono — internal vector + the returned JS copy = 2× a channel + // stereo — both channel vectors, plus one channel's JS copy = 3× a channel + const std::uint64_t mono_peak = per_channel * 2; + const std::uint64_t stereo_peak = per_channel * 3; + + // What the chunked path needs live at once: a single bufsize chunk. + const std::uint64_t chunked_working_set = + static_cast(renderer.options().fft_size) * kFloat; + + std::cout << " output ~" << out_frames << " frames (" << gib(per_channel) << " GiB/channel)\n"; + std::cout << " buffered peak: mono " << gib(mono_peak) << " GiB, stereo " + << gib(stereo_peak) << " GiB\n"; + std::cout << " chunked working set: " << chunked_working_set << " bytes\n"; + + // Buffered rendering of this job would have aborted under Emscripten's old + // 2 GiB default heap cap — both mono and stereo overflow it. + check(mono_peak > kWasmDefaultCap, + "buffered mono render exceeds the 2 GiB default heap cap"); + check(stereo_peak > kWasmDefaultCap, + "buffered stereo render exceeds the 2 GiB default heap cap"); + // Raising the cap to the wasm32 maximum (4 GiB) is what lets this particular + // job through on the buffered path. + check(stereo_peak < kWasmMaxHeap, + "...but fits within the raised 4 GiB cap"); + + // The chunked path's footprint is a negligible fraction of the cap, so it + // completes regardless of how long the output is. + check(chunked_working_set < (1ull << 20), + "chunked working set is under 1 MiB"); + check(chunked_working_set * 1000 < kWasmDefaultCap, + "chunked working set is <0.1% of the heap cap"); + + // A bigger job (30 s × 500×) overflows even the 4 GiB maximum on the *mono* + // buffered path — so raising the cap is not enough, and only the chunked + // path (which is independent of output length) stays under the limit. + paulstretch::OfflineRenderer big({ + .stretch = 500.0f, + .fft_size = 4096, + .sample_rate = 48000.0f, + .window = paulstretch::Window::Hann, + .onset_detection_sensitivity = 0.0f, + }); + const std::uint64_t big_per_channel = + static_cast(big.estimate_output_frames(30 * 48000)) * kFloat; + std::cout << " bigger job buffered mono peak: " << gib(big_per_channel * 2) << " GiB\n"; + check(big_per_channel * 2 > kWasmMaxHeap, + "a bigger job exceeds even the 4 GiB wasm32 maximum (chunking required)"); +} + +} // namespace + +int main() { + std::cout << "backend=" << paulstretch::fft_backend_name() + << " simd=" << paulstretch::fft_simd_arch() + << " width=" << paulstretch::fft_simd_size() << '\n'; + + test_parity(); + test_memory_bounds(); + + if (failures == 0) { + std::cout << "\nOK\n"; + return EXIT_SUCCESS; + } + std::cerr << "\n" << failures << " check(s) failed\n"; + return EXIT_FAILURE; +} From adda87435a0d1f8a68f22e2b600339e78d59d3b7 Mon Sep 17 00:00:00 2001 From: Oli Larkin Date: Thu, 2 Jul 2026 11:51:36 +0200 Subject: [PATCH 3/3] Add scalar (non-SIMD) wasm build for WebView compatibility Ship a second WebAssembly artifact, paulstretch.nosimd.wasm, built without -msimd128 alongside the SIMD paulstretch.wasm. The SIMD build fails to compile in WebViews without WASM SIMD support (notably macOS WKWebView before Safari 16.4 / macOS 13), where the module can't be parsed at all. Consumers feature-detect SIMD at runtime and load the matching binary; both are built with the same Emscripten version so a single glue drives either one. - scripts/build-wasm.sh: build both variants and assemble npm/dist. Deletes the linked output before each variant build so an up-to-date tree still refreshes it (CMake links straight into npm/dist and won't relink an up-to-date target). - npm/package.json: 0.3.0; export + pack dist/paulstretch.nosimd.wasm. - CI/release: build both via the script; smoke-test each wasm in the browser. - README: document the scalar fallback and the feature-detect probe. --- .github/workflows/ci.yml | 66 +++++++++++++++-------------------- .github/workflows/release.yml | 14 ++------ npm/README.md | 37 ++++++++++++++++++-- npm/package.json | 6 ++-- scripts/build-wasm.sh | 64 +++++++++++++++++++++++++++++++++ 5 files changed, 133 insertions(+), 54 deletions(-) create mode 100755 scripts/build-wasm.sh diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f1159be..22b8afc 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -48,10 +48,6 @@ jobs: wasm: name: wasm (emscripten) runs-on: ubuntu-latest - strategy: - fail-fast: false - matrix: - simd: [ON, OFF] steps: - uses: actions/checkout@v4 @@ -61,28 +57,19 @@ jobs: version: latest actions-cache-folder: emsdk-cache - - name: Configure - run: > - emcmake cmake -S . -B build-wasm - -DCMAKE_BUILD_TYPE=Release - -DPAULSTRETCH_BUILD_WASM=ON - -DPAULSTRETCH_BUILD_TESTS=OFF - -DPAULSTRETCH_BUILD_EXAMPLES=OFF - -DPAULSTRETCH_ENABLE_SIMD=${{ matrix.simd }} - - - name: Build - run: cmake --build build-wasm --config Release --parallel + # Builds both the SIMD and scalar wasm and assembles npm/dist. The scalar + # (paulstretch.nosimd.wasm) is the fallback for WebViews without WASM SIMD. + - name: Build WASM (SIMD + scalar) + run: scripts/build-wasm.sh - name: Verify artifacts run: | test -f npm/dist/paulstretch.js test -f npm/dist/paulstretch.wasm + test -f npm/dist/paulstretch.nosimd.wasm ls -la npm/dist/ - # Only the SIMD=ON build (the production artifact) is packed; SIMD=OFF - # is just a build-compatibility check. - name: npm pack (verify package layout) - if: matrix.simd == 'ON' working-directory: npm run: | npm pack --dry-run @@ -90,34 +77,37 @@ jobs: ls -la *.tgz - name: Upload npm tarball - if: matrix.simd == 'ON' uses: actions/upload-artifact@v4 with: name: paulstretch-wasm-npm path: npm/*.tgz if-no-files-found: error - # Confirm the WASM actually loads and runs in a real browser, not just - # in Node. Uses the pre-installed Google Chrome on ubuntu runners. + # Confirm the WASM actually loads and runs in a real browser, not just in + # Node — once per wasm binary (SIMD and scalar) via the shared glue. - name: Browser smoke test (headless Chrome) - if: matrix.simd == 'ON' run: | set -e - cp npm/dist/paulstretch.js npm/dist/paulstretch.wasm tests/browser/ - python3 -m http.server 8765 --directory tests/browser >/dev/null 2>&1 & - SERVER=$! - # Give the server a moment to bind. - for i in $(seq 1 10); do - curl -sf http://localhost:8765/index.html >/dev/null && break - sleep 0.5 + for wasm in paulstretch.wasm paulstretch.nosimd.wasm; do + echo "::group::browser smoke test — $wasm" + rm -rf tests/browser/paulstretch.wasm + cp npm/dist/paulstretch.js tests/browser/ + cp "npm/dist/$wasm" tests/browser/paulstretch.wasm + python3 -m http.server 8765 --directory tests/browser >/dev/null 2>&1 & + SERVER=$! + for i in $(seq 1 10); do + curl -sf http://localhost:8765/index.html >/dev/null && break + sleep 0.5 + done + DOM=$(google-chrome --headless=new --disable-gpu --no-sandbox \ + --virtual-time-budget=10000 \ + --dump-dom http://localhost:8765/index.html) + kill $SERVER 2>/dev/null || true + echo "$DOM" + echo "$DOM" | grep -q 'data-status="ok"' || { + echo "::error::browser smoke test FAILED for $wasm — expected data-status=\"ok\" in DOM" + exit 1 + } + echo "::endgroup::" done - DOM=$(google-chrome --headless=new --disable-gpu --no-sandbox \ - --virtual-time-budget=10000 \ - --dump-dom http://localhost:8765/index.html) - kill $SERVER 2>/dev/null || true - echo "$DOM" - echo "$DOM" | grep -q 'data-status="ok"' || { - echo "::error::browser smoke test FAILED — expected data-status=\"ok\" in DOM" - exit 1 - } diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 8735b01..673d8db 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -48,22 +48,14 @@ jobs: version: latest actions-cache-folder: emsdk-cache - - name: Configure - run: > - emcmake cmake -S . -B build-wasm - -DCMAKE_BUILD_TYPE=Release - -DPAULSTRETCH_BUILD_WASM=ON - -DPAULSTRETCH_BUILD_TESTS=OFF - -DPAULSTRETCH_BUILD_EXAMPLES=OFF - -DPAULSTRETCH_ENABLE_SIMD=ON - - - name: Build - run: cmake --build build-wasm --config Release --parallel + - name: Build WASM (SIMD + scalar) + run: scripts/build-wasm.sh - name: Verify artifacts run: | test -f npm/dist/paulstretch.js test -f npm/dist/paulstretch.wasm + test -f npm/dist/paulstretch.nosimd.wasm ls -la npm/dist/ - name: Publish diff --git a/npm/README.md b/npm/README.md index f86b402..ce684b1 100644 --- a/npm/README.md +++ b/npm/README.md @@ -170,13 +170,44 @@ const Module = await PaulstretchModule({ }); ``` +### SIMD and the scalar fallback + +The primary `paulstretch.wasm` is built with WASM SIMD (`-msimd128`) for a faster +FFT. Some WebViews can't parse a SIMD module — notably macOS **WKWebView before +Safari 16.4 / macOS 13** — and fail to compile it at all. The package therefore +also ships a scalar build, `paulstretch.nosimd.wasm`, with no SIMD opcodes. + +Feature-detect SIMD at runtime and load the matching binary. Both wasm are built +with the same Emscripten version, so the single glue drives either one: + +```js +import simdUrl from '@olilarkin/paulstretch-wasm/paulstretch.wasm?url'; +import scalarUrl from '@olilarkin/paulstretch-wasm/paulstretch.nosimd.wasm?url'; +import PaulstretchModule from '@olilarkin/paulstretch-wasm'; + +// A tiny module containing a v128 local; validate() never throws. +const SIMD_PROBE = new Uint8Array([ + 0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11, +]); +const hasSimd = WebAssembly.validate(SIMD_PROBE); +const wasmUrl = hasSimd ? simdUrl : scalarUrl; + +const Module = await PaulstretchModule({ + locateFile: (path) => (path.endsWith('.wasm') ? wasmUrl : path), +}); +// Module.fftSimdArch() reports "WASM_SIMD128" or "4xScalar". +``` + ## Building from source ```bash -emcmake cmake -S . -B build-wasm -cmake --build build-wasm -# outputs land in npm/dist/ +# Build both the SIMD and scalar wasm and assemble npm/dist/: +scripts/build-wasm.sh cd npm && npm pack + +# Or a single (SIMD) build directly: +emcmake cmake -S . -B build-wasm +cmake --build build-wasm # outputs land in npm/dist/ ``` ## License diff --git a/npm/package.json b/npm/package.json index 06fab99..671eb3c 100644 --- a/npm/package.json +++ b/npm/package.json @@ -1,6 +1,6 @@ { "name": "@olilarkin/paulstretch-wasm", - "version": "0.2.1", + "version": "0.3.0", "description": "Paulstretch extreme time-stretching algorithm, compiled to WebAssembly via Emscripten", "type": "module", "main": "dist/paulstretch.js", @@ -12,11 +12,13 @@ "import": "./dist/paulstretch.js", "default": "./dist/paulstretch.js" }, - "./paulstretch.wasm": "./dist/paulstretch.wasm" + "./paulstretch.wasm": "./dist/paulstretch.wasm", + "./paulstretch.nosimd.wasm": "./dist/paulstretch.nosimd.wasm" }, "files": [ "dist/paulstretch.js", "dist/paulstretch.wasm", + "dist/paulstretch.nosimd.wasm", "index.d.ts", "README.md" ], diff --git a/scripts/build-wasm.sh b/scripts/build-wasm.sh new file mode 100755 index 0000000..0107547 --- /dev/null +++ b/scripts/build-wasm.sh @@ -0,0 +1,64 @@ +#!/usr/bin/env bash +# +# Build BOTH WebAssembly artifacts and assemble the npm package's dist/: +# +# npm/dist/paulstretch.js - Emscripten glue (identical for both wasm) +# npm/dist/paulstretch.wasm - SIMD build (-msimd128, WASM_SIMD128 PFFFT) +# npm/dist/paulstretch.nosimd.wasm - scalar build (no SIMD) +# +# The scalar wasm is a fallback for WebViews without WASM SIMD support (e.g. +# macOS WKWebView before Safari 16.4 / macOS 13). Consumers feature-detect SIMD +# at runtime and load the matching .wasm via Emscripten's `locateFile`. Because +# both wasm are built with the SAME Emscripten version, the generated glue is +# byte-identical, so one glue drives either binary. +# +# Requires emcc/emcmake on PATH (run `source /emsdk_env.sh` first, or use +# the setup-emsdk CI action). +# +# Usage: scripts/build-wasm.sh + +set -euo pipefail + +REPO_ROOT="$(git rev-parse --show-toplevel)" +cd "$REPO_ROOT" + +COMMON_ARGS=( + -DCMAKE_BUILD_TYPE=Release + -DPAULSTRETCH_BUILD_WASM=ON + -DPAULSTRETCH_BUILD_TESTS=OFF + -DPAULSTRETCH_BUILD_EXAMPLES=OFF +) + +# Both variants link straight into npm/dist as paulstretch.{js,wasm}. We delete +# that output before each build so an up-to-date build tree still refreshes it — +# otherwise CMake skips the relink and we'd copy a stale wasm from the other +# variant. Deleting only the final artifact forces a cheap relink, not a full +# recompile. + +# 1. Scalar (non-SIMD) build first, then stash it before the SIMD build +# overwrites paulstretch.wasm. +echo "==> Configuring scalar (SIMD=OFF) build" +emcmake cmake -S . -B build-wasm-nosimd "${COMMON_ARGS[@]}" -DPAULSTRETCH_ENABLE_SIMD=OFF +echo "==> Building scalar wasm" +rm -f npm/dist/paulstretch.js npm/dist/paulstretch.wasm +cmake --build build-wasm-nosimd --config Release --parallel +cp npm/dist/paulstretch.wasm npm/dist/paulstretch.nosimd.wasm + +# 2. SIMD build. This leaves npm/dist/paulstretch.{js,wasm} as the SIMD pair, +# which is the canonical glue + primary artifact. +echo "==> Configuring SIMD (SIMD=ON) build" +emcmake cmake -S . -B build-wasm "${COMMON_ARGS[@]}" -DPAULSTRETCH_ENABLE_SIMD=ON +echo "==> Building SIMD wasm" +rm -f npm/dist/paulstretch.js npm/dist/paulstretch.wasm +cmake --build build-wasm --config Release --parallel + +# Sanity: the two binaries must actually differ (SIMD vs scalar). Identical +# bytes means the stale-output hazard bit us again. +if cmp -s npm/dist/paulstretch.wasm npm/dist/paulstretch.nosimd.wasm; then + echo "error: paulstretch.wasm and paulstretch.nosimd.wasm are identical" >&2 + echo " (scalar build did not produce a distinct binary)" >&2 + exit 1 +fi + +echo "==> Assembled npm/dist:" +ls -la npm/dist/paulstretch.js npm/dist/paulstretch.wasm npm/dist/paulstretch.nosimd.wasm