feat(smallestai): add Pulse STT with real-time streaming and batch transcription#5312
feat(smallestai): add Pulse STT with real-time streaming and batch transcription#5312harshitajain165 wants to merge 8 commits intolivekit:mainfrom
Conversation
| else: | ||
| self._event_ch.send_nowait( | ||
| stt.SpeechEvent( | ||
| type=stt.SpeechEventType.INTERIM_TRANSCRIPT, | ||
| request_id=self._session_id, | ||
| alternatives=alts, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🔴 STT capability declares interim_results=False but code emits INTERIM_TRANSCRIPT events
The STT constructor at line 139 declares interim_results=False in STTCapabilities, but _process_stream_event at lines 517-524 emits stt.SpeechEventType.INTERIM_TRANSCRIPT events whenever the server returns a non-final transcript (is_final=False). The Smallest AI Pulse API does return partial transcripts (the schema comment at line 475 says transcript is "partial or final text"), so the capability should be True. This mismatch causes incorrect behavior in the FallbackAdapter (livekit-agents/livekit/agents/stt/fallback_adapter.py:80) which uses all(t.capabilities.interim_results for t in stt) to compose capabilities — it would incorrectly report that the combined STT doesn't support interim results even if the other STT does.
Was this helpful? React with 👍 or 👎 to provide feedback.
49ec8de to
619df8f
Compare
tinalenguyen
left a comment
There was a problem hiding this comment.
hi, thank you for the PR! i have a few notes, could you:
- address all of the devin comments, especially the one regarding interim transcripts
- remove smallest ai from the test files, as we do not have a smallestai api key for testing as of yet
- sign the CLA if possible
|
Hey @tinalenguyen Thanks for the comment. I'm addressing the devin comments, removing smallest ai from test files and signing the CLA. Will keep you posted once all are done |
fc0b7bb to
26ba81c
Compare
|
recheck |
|
Hey @tinalenguyen The devin comments have been incorporated, smallest ai has been removed from test files and I have signed the CLA too. Please feel free to re-review/take this forward. |
…support Adds speech-to-text support to the existing Smallest AI plugin via the Waves Pulse API, covering both real-time WebSocket streaming and pre-recorded HTTP batch transcription.
- Add lightning-v3.1 as the new default model (80+ voices, ~100ms latency) - Remove deprecated lightning and lightning-large models - Update base URL to api.smallest.ai/waves/v1 - Simplify endpoint to get_speech for all models (removes get_speech_long_text) - Add alaw encoding support (v3.1) - Restrict consistency/similarity/enhancement params to lightning-v2 only
- Remove unused `interim_results` option from STT (constructor, options dataclass, and update_options). The Pulse API does not support server-side interim filtering and the plugin never honoured the flag. STTCapabilities now declares interim_results=False. - Remove smallestai from test_stt.py and test_tts.py since there is no Smallest AI API key available in CI. - Remove spurious TTS warning about consistency/similarity/enhancement params that fired on every default TTS() instantiation. The downstream _to_smallest_options already correctly excludes those params for non-v2 models. Made-with: Cursor
26ba81c to
0edd0e0
Compare
|
@harshitajain165 Thank you for iterating on the feedback! For the STT, I printed out the received events and it does seem that interim results are emitted. Is there a setting to pass to the API or does the API always send interim results? If that is always the case, I would set Also, when testing the TTS, I keep facing this error: |
| encoding: STTEncoding | str = "linear16", | ||
| word_timestamps: bool = True, | ||
| diarize: bool = False, | ||
| eou_timeout_ms: int = 800, |
There was a problem hiding this comment.
Should this be 0?
With our end-of-turn detection model, we should prioritize minimizing latency to receive transcripts.
Summary
This PR adds speech-to-text support to the existing
livekit-plugins-smallestaipackage via the Smallest AI Pulse STT API, complementing the Lightning TTS integration that already exists.SpeechStream): real-time transcription over WebSocket with interim and final transcripts, ~64ms TTFT_recognize_impl): pre-recorded transcription via HTTP POSTstart/end/confidenceincluded by default (word_timestamps=True)diarize=Trueeou_timeout_ms(100–10,000ms, default 800ms)Implementation notes
transcript,is_final,is_last,finalizemessage) verified against docs.smallest.aiSTART_OF_SPEECHis inferred from the first non-empty transcript since the Pulse API does not emit a dedicated speech-start eventTest plan
test_recognize[livekit.plugins.smallestai]passestest_stream[livekit.plugins.smallestai]passesruff formatandruff checkpassmypy --strictpasses