Skip to main content

Overview

GradiumSTTService provides real-time speech recognition using Gradium’s WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.

Installation

To use Gradium services, install the required dependency:
pip install "pipecat-ai[gradium]"

Prerequisites

Gradium Account Setup

Before using Gradium STT services, you need:
  1. Gradium Account: Sign up at Gradium
  2. API Key: Generate an API key from your account dashboard
  3. Region Selection: Choose your preferred region (EU or US)

Required Environment Variables

  • GRADIUM_API_KEY: Your Gradium API key for authentication

Configuration

GradiumSTTService

api_key
str
required
Gradium API key for authentication.
api_endpoint_base_url
str
default:"wss://eu.api.gradium.ai/api/speech/asr"
WebSocket endpoint URL. Override for different regions or custom deployments.
params
GradiumSTTService.InputParams
default:"None"
deprecated
Configuration parameters for language and delay settings. Deprecated in v0.0.105. Use settings=GradiumSTTService.Settings(...) instead.
json_config
str
default:"None"
Optional JSON configuration string for additional model settings. Deprecated in favor of params.
settings
GradiumSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"GRADIUM_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using GradiumSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strNoneExpected language of the audio. (Inherited from base STT settings.) Helps ground the model to a specific language and improve transcription quality.
delay_in_framesintNoneDelay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms).

Usage

Basic Setup

from pipecat.services.gradium.stt import GradiumSTTService

stt = GradiumSTTService(
    api_key=os.getenv("GRADIUM_API_KEY"),
)

With Language and Delay Configuration

from pipecat.services.gradium.stt import GradiumSTTService
from pipecat.transcriptions.language import Language

stt = GradiumSTTService(
    api_key=os.getenv("GRADIUM_API_KEY"),
    settings=GradiumSTTService.Settings(
        language=Language.EN,
        delay_in_frames=8,
    ),
)

Notes

  • Supported languages: German, English, Spanish, French, and Portuguese.
  • Silence flushing: When VAD detects the user has stopped speaking, the service sends silence frames to flush the transcription buffer, resulting in faster final transcripts without closing the connection.
  • Audio format: Sends audio as 24 kHz 16-bit PCM in 80ms chunks.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Gradium STT supports the standard service connection events:
EventDescription
on_connectedConnected to Gradium WebSocket
on_disconnectedDisconnected from Gradium WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gradium")