Overview
GradiumSTTService provides real-time speech recognition using Gradium’s WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.
Gradium STT API Reference
Pipecat’s API methods for Gradium STT integration
Example Implementation
Complete example with interruption handling
Gradium Documentation
Official Gradium STT API documentation
Gradium Platform
Access API keys and speech models
Installation
To use Gradium services, install the required dependency:Prerequisites
Gradium Account Setup
Before using Gradium STT services, you need:- Gradium Account: Sign up at Gradium
- API Key: Generate an API key from your account dashboard
- Region Selection: Choose your preferred region (EU or US)
Required Environment Variables
GRADIUM_API_KEY: Your Gradium API key for authentication
Configuration
GradiumSTTService
Gradium API key for authentication.
WebSocket endpoint URL. Override for different regions or custom deployments.
Configuration parameters for language and delay settings. Deprecated in
v0.0.105. Use
settings=GradiumSTTService.Settings(...) instead.Optional JSON configuration string for additional model settings. Deprecated
in favor of
params.Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See stt-benchmark.
Settings
Runtime-configurable settings passed via thesettings constructor argument using GradiumSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | STT model identifier. (Inherited from base STT settings.) |
language | Language | str | None | Expected language of the audio. (Inherited from base STT settings.) Helps ground the model to a specific language and improve transcription quality. |
delay_in_frames | int | None | Delay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms). |
Usage
Basic Setup
With Language and Delay Configuration
Notes
- Supported languages: German, English, Spanish, French, and Portuguese.
- Silence flushing: When VAD detects the user has stopped speaking, the service sends silence frames to flush the transcription buffer, resulting in faster final transcripts without closing the connection.
- Audio format: Sends audio as 24 kHz 16-bit PCM in 80ms chunks.
Event Handlers
Gradium STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Gradium WebSocket |
on_disconnected | Disconnected from Gradium WebSocket |