You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: remove krisp as a smart endpointing provider from docs
Krisp is not a valid smartEndpointingPlan provider. The API only supports
'vapi', 'livekit', and 'custom-endpointing-model'. The docs incorrectly
listed krisp as an audio-based smart endpointing provider, causing users
to get API validation errors when trying to use it.
Changes:
- Remove krisp from the smart endpointing providers list
- Remove the 'Krisp threshold configuration' section
- Replace the 'Audio-based endpointing (Krisp example)' config example
with a 'Non-English smart endpointing (Vapi example)'
- Remove krisp from the speech-configuration.mdx providers list
Note: Krisp references for background speech denoising (a separate
feature) are intentionally left unchanged.
Co-authored-by: Sahil Suman <sahilsuman933@users.noreply.github.com>
Copy file name to clipboardExpand all lines: fern/customization/speech-configuration.mdx
+1-9Lines changed: 1 addition & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,15 +39,7 @@ This plan defines the parameters for when the assistant begins speaking after th
39
39
-**End-of-turn prediction** - predicting when the current speaker is likely to finish their turn.
40
40
-**Backchannel prediction** - detecting moments where a listener may provide short verbal acknowledgments like "uh-huh", "yeah", etc. to show engagement, without intending to take over the speaking turn. This is better handled by the assistant's stopSpeakingPlan.
41
41
42
-
We offer different providers that can be audio-based, text-based, or audio-text based:
43
-
44
-
**Audio-based providers:**
45
-
46
-
-**Krisp**: Audio-based model that analyzes prosodic and acoustic features such as changes in intonation, pitch, and rhythm to detect when users finish speaking. Since it's audio-based, it always notifies when the user is done speaking, even for brief acknowledgments. Vapi offers configurable acknowledgement words and a well-configured stop speaking plan to handle this properly.
47
-
48
-
Configure Krisp with a threshold between 0 and 1 (default 0.5), where 1 means the user definitely stopped speaking and 0 means they're still speaking. Use lower values for snappier conversations and higher values for more conservative detection.
49
-
50
-
When interacting with an AI agent, users may genuinely want to interrupt to ask a question or shift the conversation, or they might simply be using backchannel cues like "right" or "okay" to signal they're actively listening. The core challenge lies in distinguishing meaningful interruptions from casual acknowledgments. Since the audio-based model signals end-of-turn after each word, configure the stop speaking plan with the right number of words to interrupt, interruption settings, and acknowledgement phrases to handle backchanneling properly.
42
+
We offer different providers that can be text-based or audio-text based:
- Fallback option when other smart endpointing providers aren't suitable
307
303
308
-
### Krisp threshold configuration
309
-
310
-
Krisp's audio-base model returns a probability between 0 and 1, where 1 means the user definitely stopped speaking and 0 means they're still speaking.
311
-
312
-
**Threshold settings:**
313
-
314
-
-**0.0-0.3:** Very aggressive detection - responds quickly but may interrupt users mid-sentence
315
-
-**0.4-0.6:** Balanced detection (default: 0.5) - good balance between responsiveness and accuracy
316
-
-**0.7-1.0:** Conservative detection - waits longer to ensure users have finished speaking
317
-
318
-
**Configuration example:**
319
-
320
-
```json
321
-
{
322
-
"startSpeakingPlan": {
323
-
"smartEndpointingPlan": {
324
-
"provider": "krisp",
325
-
"threshold": 0.5
326
-
}
327
-
}
328
-
}
329
-
```
330
-
331
-
**Important considerations:**
332
-
Since Krisp is audio-based, it always notifies when the user is done speaking, even for brief acknowledgments. Configure the stop speaking plan with appropriate `acknowledgementPhrases` and `numWords` settings to handle backchanneling properly.
333
-
334
304
### Assembly turn detection
335
305
336
306
AssemblyAI's turn detection model uses a neural network to detect when someone has finished speaking. The model understands the meaning and flow of speech to make better decisions about when a turn has ended.
@@ -613,15 +583,19 @@ User Interrupts → Assistant Audio Stopped → backoffSeconds Blocks All Output
613
583
614
584
**Optimized for:** Text-based endpointing with longer timeouts for different speech patterns and international support.
615
585
616
-
### Audio-based endpointing (Krisp example)
586
+
### Non-English smart endpointing (Vapi example)
617
587
618
588
```json
619
589
{
620
590
"startSpeakingPlan": {
621
591
"waitSeconds": 0.4,
622
592
"smartEndpointingPlan": {
623
-
"provider": "krisp",
624
-
"threshold": 0.5
593
+
"provider": "vapi"
594
+
},
595
+
"transcriptionEndpointingPlan": {
596
+
"onPunctuationSeconds": 0.1,
597
+
"onNoPunctuationSeconds": 1.5,
598
+
"onNumberSeconds": 0.5
625
599
}
626
600
},
627
601
"stopSpeakingPlan": {
@@ -640,7 +614,7 @@ User Interrupts → Assistant Audio Stopped → backoffSeconds Blocks All Output
640
614
}
641
615
```
642
616
643
-
**Optimized for:** Non-English conversations with robust backchanneling configuration to handle audio-based detection limitations.
617
+
**Optimized for:** Non-English conversations with Vapi's heuristic endpointing and robust backchanneling configuration.
644
618
645
619
### Audio-text based endpointing (Assembly example)
0 commit comments