Add SenseVoice local STT model support by waibiwaibig · Pull Request #1634 · getpaseo/paseo

waibiwaibig · 2026-06-20T13:06:53Z

Summary

Closes #1633.

Adds daemon-side local STT support for a sherpa-onnx SenseVoice int8 model:

adds sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 to the local speech model catalog
downloads SenseVoice from Hugging Face mirror direct files before falling back to the GitHub release archive
extends the offline recognizer to initialize sherpa-onnx senseVoice configs as well as existing NeMo transducer configs
builds STT recognizer configs from catalog metadata instead of assuming every STT model has Parakeet transducer files
allows the model to resolve through existing dictation / voice-mode local STT config paths
documents the new Chinese/English mixed local STT option
fixes speech:download --help so it does not accidentally start default model downloads
fixes speech:transcribe:local provider setup so local wav transcription can initialize the shared local runtime

Scope

This keeps the current mobile/iPad audio streaming architecture unchanged. STT still runs on the daemon host.

Out of scope for this PR:

mobile-side local inference
UI redesign or model-management UI
OpenAI speech changes
Chinese TTS
changing the default STT model for all users

Testing

ELECTRON_SKIP_BINARY_DOWNLOAD=1 npm install --workspaces --include-workspace-root
npm run build:server-deps
npx vitest run packages/server/src/server/speech/speech-config-resolver.test.ts packages/server/src/server/speech/providers/local/sherpa/model-downloader.test.ts packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.test.ts
npm run typecheck --workspace=@getpaseo/server
npm run build --workspace=@getpaseo/server
npm run lint -- packages/server/src/server/speech/providers/local/sherpa/model-catalog.ts packages/server/src/server/speech/providers/local/sherpa/model-downloader.ts packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.ts packages/server/src/server/speech/providers/local/models.ts packages/server/src/server/speech/providers/local/worker-process.ts packages/server/src/server/speech/speech-config-resolver.test.ts packages/server/src/server/speech/providers/local/sherpa/model-downloader.test.ts packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.test.ts packages/server/scripts/download-speech-models.ts packages/server/scripts/transcribe-local-wav.ts
npm run format:check:files -- public-docs/voice.md packages/server/src/server/speech/providers/local/sherpa/model-catalog.ts packages/server/src/server/speech/providers/local/sherpa/model-downloader.ts packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.ts packages/server/src/server/speech/providers/local/models.ts packages/server/src/server/speech/providers/local/worker-process.ts packages/server/src/server/speech/speech-config-resolver.test.ts packages/server/src/server/speech/providers/local/sherpa/model-downloader.test.ts packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.test.ts packages/server/scripts/download-speech-models.ts packages/server/scripts/transcribe-local-wav.ts
npm run speech:download --workspace=@getpaseo/server -- --help

Manual local model test:

npm run speech:download --workspace=@getpaseo/server -- --models-dir /tmp/paseo-sensevoice-direct-test --model sense-voice-zh-en-ja-ko-yue-int8-2025-09-09
Downloaded via https://hf-mirror.com/.../model.int8.onnx and tokens.txt
Completed in about 15 seconds on this network
Downloaded files:
- model.int8.onnx: 241M
- tokens.txt: 312K

Manual local inference test:

Downloaded test_wavs/zh.wav from the same HF mirror repo
npm run speech:transcribe:local --workspace=@getpaseo/server -- /tmp/paseo-sensevoice-zh.wav --models-dir /tmp/paseo-sensevoice-direct-test --model sense-voice-zh-en-ja-ko-yue-int8-2025-09-09
Output: 放时间早上九点至下午五点

greptile-apps · 2026-06-20T13:27:17Z

Greptile Summary

Adds sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 as a local STT option supporting Chinese, English, Japanese, Korean, and Cantonese, and wires it through the full daemon-side pipeline: catalog metadata, direct-file download with HF-mirror fallback, recognizer config dispatch, and the config resolver.

Catalog + downloader: model-catalog.ts gains a discriminated union SherpaOnnxCatalogEntry that attaches a recognizer spec per STT model; model-downloader.ts adds a downloadDirectFiles path with per-URL retry and atomic renames before falling back to the archive.
Recognizer + worker: sherpa-offline-recognizer.ts now builds either a nemo_transducer or senseVoice config from the spec; worker-process.ts reads file paths from catalog metadata instead of hardcoding Parakeet filenames.
Bug fixes: speech:download --help no longer triggers default model downloads; transcribe-local-wav.ts now satisfies the voiceTurnDetection field required by the shared runtime config shape.

Confidence Score: 5/5

Safe to merge — no runtime correctness issues introduced.

The new model type flows cleanly through every layer: catalog spec, direct-file downloader, recognizer config builder, and the worker engine. The direct-download path uses atomic renames and falls back to the archive on any failure, leaving no corrupt state. The two script fixes are narrow and clearly correct. Remaining findings are naming and structural style concerns.

No files require special attention for correctness; sherpa-offline-recognizer.test.ts carries the test-pattern concerns noted in the previous review round.

Important Files Changed

Filename	Overview
packages/server/src/server/speech/providers/local/sherpa/model-catalog.ts	Adds SenseVoice catalog entry with directFiles for Hugging Face mirror downloads; refactors SherpaOnnxCatalogEntry to a discriminated union carrying a recognizer spec per STT model. Clean structural extension.
packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.ts	Extends the recognizer engine to support sense_voice configs alongside nemo_transducer. Logic is correct; two private interface aliases duplicate the union member shapes unnecessarily.
packages/server/src/server/speech/providers/local/sherpa/model-downloader.ts	Adds downloadDirectFiles with per-URL fallback; correctly falls back to archive on failure and uses atomic temp-file rename. Download logic is sound.
packages/server/src/server/speech/providers/local/worker-process.ts	Replaces hardcoded Parakeet paths with catalog-driven buildSttRecognizerModel; now uses path.join via localModelPath. The Parakeet-named provider classes now serve both model families without renaming.
packages/server/src/server/speech/providers/local/sherpa/sherpa-offline-recognizer.test.ts	New test file verifying SenseVoice recognizer config shape; uses vi.mock and vi.hoisted patterns previously flagged as banned by project test rules.
packages/server/src/server/speech/speech-config-resolver.test.ts	Adds acceptance test for SenseVoice as dictation and voice-mode STT; clean module-interface test.
packages/server/src/server/speech/providers/local/sherpa/model-downloader.test.ts	Adds coverage for SenseVoice direct-file path; uses vi.stubGlobal for fetch and asserts both observable outcomes (files on disk) and internal URLs (previously flagged).
packages/server/src/server/speech/providers/local/models.ts	Adds getLocalSpeechModelSpec passthrough consistent with existing delegation pattern in the module facade.
packages/server/scripts/download-speech-models.ts	Adds --help/-h guard before arg parsing to fix accidental default-model downloads; correct fix.
packages/server/scripts/transcribe-local-wav.ts	Adds voiceTurnDetection to the required RequestedSpeechProviders shape; fixes local WAV transcription initialization.
public-docs/voice.md	Documents SenseVoice model ID, language support, config snippet, and HF-mirror download strategy; accurate and consistent with implementation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ensureSherpaOnnxModel] --> B{spec.directFiles?}
    B -- yes --> C[downloadDirectFiles\nhf-mirror URLs then HF fallback]
    C --> D{requiredFiles present?}
    D -- yes --> E[return modelDir]
    D -- no --> F[warn: fall back to archive]
    C -- throws --> F
    B -- no --> G[download .tar.bz2 archive]
    F --> G
    G --> H[extractTarArchive to modelsDir]
    H --> I{requiredFiles present?}
    I -- yes --> J[clean up archive, return modelDir]
    I -- no --> K[throw: required files missing]

    subgraph workerProcess [worker-process.ts — engine init]
        L[buildSttRecognizerModel] --> M{spec.recognizer.kind}
        M -- nemo_transducer --> N[absolute paths for\nencoder/decoder/joiner/tokens]
        M -- sense_voice --> O[absolute paths for\nmodel/tokens + language config]
        N --> P[SherpaOfflineRecognizerEngine\nnemo_transducer config]
        O --> Q[SherpaOfflineRecognizerEngine\nsenseVoice config]
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ensureSherpaOnnxModel] --> B{spec.directFiles?}
    B -- yes --> C[downloadDirectFiles\nhf-mirror URLs then HF fallback]
    C --> D{requiredFiles present?}
    D -- yes --> E[return modelDir]
    D -- no --> F[warn: fall back to archive]
    C -- throws --> F
    B -- no --> G[download .tar.bz2 archive]
    F --> G
    G --> H[extractTarArchive to modelsDir]
    H --> I{requiredFiles present?}
    I -- yes --> J[clean up archive, return modelDir]
    I -- no --> K[throw: required files missing]

    subgraph workerProcess [worker-process.ts — engine init]
        L[buildSttRecognizerModel] --> M{spec.recognizer.kind}
        M -- nemo_transducer --> N[absolute paths for\nencoder/decoder/joiner/tokens]
        M -- sense_voice --> O[absolute paths for\nmodel/tokens + language config]
        N --> P[SherpaOfflineRecognizerEngine\nnemo_transducer config]
        O --> Q[SherpaOfflineRecognizerEngine\nsenseVoice config]
    end

_{Reviews (2): Last reviewed commit: "Use path.join for local speech model pat..." | Re-trigger Greptile}

waibiwaibig added 2 commits June 20, 2026 21:06

Add SenseVoice local STT model support

ef4e3cb

Use direct SenseVoice model mirrors for local downloads

60cdbc1

waibiwaibig marked this pull request as ready for review June 20, 2026 13:22

greptile-apps Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread packages/server/src/server/speech/providers/local/worker-process.ts

Use path.join for local speech model paths

88de37e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SenseVoice local STT model support#1634

Add SenseVoice local STT model support#1634
waibiwaibig wants to merge 3 commits into
getpaseo:mainfrom
waibiwaibig:feat/sensevoice-local-stt

waibiwaibig commented Jun 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 20, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

waibiwaibig commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Testing

Uh oh!

greptile-apps Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

waibiwaibig commented Jun 20, 2026 •

edited

Loading

greptile-apps Bot commented Jun 20, 2026 •

edited

Loading