fix: audio export quality and Whisper language selection by ScreepCode · Pull Request #4 · thehwang/Scripta

ScreepCode · 2026-06-15T16:33:12Z

Summary

Audio too quiet (Audio export is too quiet #2): Raises system audio capture from 16 kHz to 48 kHz and updates the AAC encoder to 48 kHz / 128 kbps. The exported audio file now receives the original 48 kHz PCM instead of the already-downsampled 16 kHz buffer that was previously written. Before this change the Nyquist limit cut everything above 8 kHz, making recordings sound thin and quiet.
Whisper language hardcoded (Language selection not respected for mic transcription (Whisper) and summary #3): Adds a language property to WhisperEngine and wires it up from MeetingRecorder.startRecording() using the 2-letter ISO prefix of recognitionLanguage (e.g. "de-DE" → "de"). Previously transcribeChunk had the language hardcoded, so the mic channel always transcribed in one language regardless of the UI selection.

Changes

File	Change
`SystemAudioCapture.swift`	`config.sampleRate`: 16 000 → 48 000
`MeetingRecorder.swift`	`audioFileSettings`: 48 kHz / 128 kbps; write original PCM to file; fix `writeMicAudio` memcpy condition
`WhisperEngine.swift`	Add `language` property, use `self.language` in `transcribeChunk`

Test plan

Record a short meeting in German or other language — mic channel should now transcribe in German (or other language) (not English/hardcoded language)
Remote channel (SFSpeech) continues to work as before
Exported audio-mic.m4a and audio-system.m4a sound noticeably louder/fuller than before
Switching language in UI between sessions changes Whisper transcription language correctly

- SystemAudioCapture: raise sample rate from 16 kHz to 48 kHz so exported audio captures the full voice frequency range (0–24 kHz) instead of being limited to 8 kHz (Nyquist of 16 kHz) - MeetingRecorder: update audio file settings to 48 kHz / 128 kbps AAC; write original 48 kHz PCM to the audio file in handleSystemAudioBuffer instead of the already-downsampled 16 kHz buffer that was fed to SFSpeech; fix writeMicAudio memcpy fast-path to also trigger for stereo hardware input (was gated on channelCount == 1 unnecessarily) - WhisperEngine: add `language` property (default "en"), use it in transcribeChunk instead of a hardcoded language string; set it from MeetingRecorder.startRecording() via the 2-letter ISO prefix of recognitionLanguage (e.g. "de-DE" → "de") Fixes thehwang#2, fixes thehwang#3

thehwang

Reviewed the diff against the code in context. This is a clean, correct bugfix — happy to approve. Both goals land well: higher-quality audio export, and selectable Whisper language.

Quality path is self-consistent

SystemAudioCapture.sampleRate = 48_000 is SCK's native rate; capturing at 16k was effectively asking SCK to downsample for us, so 48k is cleaner.
handleSystemAudioBuffer now writes the original pcm to disk and only downsamples a separate buffer for SFSpeech. writeSystemAudio already has a format-conversion fallback, and the file's processing format is now 48k mono, so the original buffer matches and no extra conversion happens.
audioFileSettings (48k / 128 kbps mono) is shared by both the mic and system writers, so sample rates stay consistent when the two tracks are merged.

Language path is correct

strdup(self.language) is still released by the existing free(langStr) — no leak introduced.
recognitionLanguage.components(separatedBy: "-").first?.lowercased() maps en-US -> en, zh-CN -> zh, matching Whisper's two-letter codes.

Suggestions (non-blocking)

Multilingual model is a prerequisite. Language selection only works when a multilingual model (ggml-base.bin) is loaded; with an English-only *.en model, setting e.g. zh will produce garbage. Worth a guard/warning when a non-en language is chosen but the loaded model is .en.
Dropping && buffer.format.channelCount == 1 in writeMicAudio is fine here — the slow path also reads only ch0, so the result is identical. It's only safe because our buffers are non-interleaved (interleaved: false); a one-line comment noting that assumption would prevent a future interleaved-stereo memcpy(ch0) foot-gun.
Multi-subtag locales like yue-Hant-HK -> yue parse fine, but yue is only supported by larger Whisper models, not base. Edge case, just flagging.
self.language is read on processingQueue. It isn't mutated during recording so it's safe in practice; capturing let lang = self.language before the async block would make that explicit.

Not affected (double-checked)

Whisper's 16 kHz input requirement: the mic→whisper feed is unchanged; only the saved file's sample rate changed.
Mic/system merge: both share the 48k settings, so they stay aligned.

ScreepCode force-pushed the fix/audio-quality-and-whisper-language branch from c211121 to 0c22e3d Compare June 15, 2026 16:35

thehwang approved these changes Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: audio export quality and Whisper language selection#4

fix: audio export quality and Whisper language selection#4
ScreepCode wants to merge 1 commit into
thehwang:mainfrom
ScreepCode:fix/audio-quality-and-whisper-language

ScreepCode commented Jun 15, 2026 •

edited

Loading

Uh oh!

thehwang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ScreepCode commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

thehwang left a comment

Choose a reason for hiding this comment

Quality path is self-consistent

Language path is correct

Suggestions (non-blocking)

Not affected (double-checked)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ScreepCode commented Jun 15, 2026 •

edited

Loading