Overview
SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API with Voice Activity Detection (VAD) for optimized processing. It efficiently processes speech segments to deliver accurate transcription with SambaNova’s high-performance inference platform.
SambaNova STT API Reference
Pipecat’s API methods for SambaNova STT integration
Example Implementation
Complete example with function calling
SambaNova Documentation
Official SambaNova API documentation and features
SambaNova Cloud
Access API keys and Whisper models
Installation
To use SambaNova services, install the required dependency:Prerequisites
SambaNova Account Setup
Before using SambaNova STT services, you need:- SambaNova Account: Sign up at SambaNova Cloud
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure access to Whisper transcription models
Required Environment Variables
SAMBANOVA_API_KEY: Your SambaNova API key for authentication
Configuration
SambaNovaSTTService
Whisper model to use for transcription. Deprecated in v0.0.105. Use
settings=SambaNovaSTTService.Settings(...) instead.SambaNova API key. Falls back to the
SAMBANOVA_API_KEY environment variable.API base URL.
Language of the audio input. Deprecated in v0.0.105. Use
settings=SambaNovaSTTService.Settings(...) instead.Optional text to guide the model’s style or continue a previous segment.
Deprecated in v0.0.105. Use
settings=SambaNovaSTTService.Settings(...)
instead.Sampling temperature between 0 and 1. Lower values produce more deterministic
results. Deprecated in v0.0.105. Use
settings=SambaNovaSTTService.Settings(...) instead.Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See stt-benchmark.
If true, allow empty
TranscriptionFrame frames to be pushed downstream
instead of discarding them. This is intended for situations where VAD fires
even though the user did not speak. In these cases, it is useful to know that
nothing was transcribed so that the agent can resume speaking, instead of
waiting longer for a transcription.Settings
Runtime-configurable settings passed via thesettings constructor argument using SambaNovaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "Whisper-Large-v3" | Whisper model to use. (Inherited from base STT settings.) |
language | Language | str | Language.EN | Language of the audio input. (Inherited from base STT settings.) |
prompt | str | None | Optional text to guide the model’s style or continue a previous segment. |
temperature | float | None | Sampling temperature between 0 and 1. |
Usage
Basic Setup
With Custom Configuration
Notes
- Segmented transcription:
SambaNovaSTTServiceextendsSegmentedSTTService(viaBaseWhisperSTTService), processing complete audio segments after VAD detects the user has stopped speaking. - Whisper API compatible: Uses the OpenAI-compatible Whisper API interface hosted on SambaNova’s infrastructure.
- Probability metrics not supported: SambaNova’s Whisper API does not support probability metrics. The
include_prob_metricsparameter has no effect.