Skip to main content

Overview

GladiaSTTService provides real-time speech recognition using Gladia’s WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.

Installation

To use Gladia services, install the required dependency:
pip install "pipecat-ai[gladia]"

Prerequisites

Gladia Account Setup

Before using Gladia STT services, you need:
  1. Gladia Account: Sign up at Gladia
  2. API Key: Generate an API key from your account dashboard
  3. Region Selection: Choose your preferred region (EU-West or US-West)

Required Environment Variables

  • GLADIA_API_KEY: Your Gladia API key for authentication
  • GLADIA_REGION: Your preferred region (optional, defaults to “eu-west”)

Configuration

GladiaSTTService

api_key
str
required
Gladia API key for authentication.
region
Literal['us-west', 'eu-west']
default:"None"
Region used to process audio. Defaults to "eu-west" when None.
url
str
default:"https://api.gladia.io/v2/live"
Gladia API URL for session initialization.
confidence
float
default:"None"
Minimum confidence threshold for transcriptions (0.0-1.0). Deprecated — no confidence threshold is applied.
encoding
str
default:"wav/pcm"
Audio encoding format. Init-only — not part of runtime-updatable settings.
bit_depth
int
default:"16"
Audio bit depth. Init-only — not part of runtime-updatable settings.
channels
int
default:"1"
Number of audio channels. Init-only — not part of runtime-updatable settings.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"None"
deprecated
Model to use for transcription. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.
params
GladiaInputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=GladiaSTTService.Settings(...) instead.
settings
GladiaSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
max_buffer_size
int
default:"20971520"
Maximum size of audio buffer in bytes (default 20MB).
should_interrupt
bool
default:"True"
Whether the bot should be interrupted when Gladia VAD detects user speech.
ttfs_p99_latency
float
default:"1.49"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using GladiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage for speech recognition. (Inherited from base STT settings.)
language_configLanguageConfigNoneDetailed language configuration with code switching support.
custom_metadataDict[str, Any]NoneAdditional metadata to include with requests.
endpointingfloatNoneSilence duration in seconds to mark end of speech.
maximum_duration_without_endpointingint5Maximum utterance duration (seconds) without silence.
pre_processingPreProcessingConfigNoneAudio pre-processing options (audio enhancer, speech threshold).
realtime_processingRealtimeProcessingConfigNoneReal-time processing features (custom vocabulary, translation, NER, sentiment).
messages_configMessagesConfigNoneWebSocket message filtering options.
enable_vadboolFalseEnable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent.

Usage

Basic Setup

from pipecat.services.gladia.stt import GladiaSTTService

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)

With Language Configuration

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import LanguageConfig

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    region="us-west",
    settings=GladiaSTTService.Settings(
        model="solaria-1",
        language_config=LanguageConfig(
            languages=["en", "es"],
            code_switching=True,
        ),
    ),
)

With Real-time Processing

from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import (
    RealtimeProcessingConfig,
    CustomVocabularyConfig,
    CustomVocabularyItem,
    TranslationConfig,
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    settings=GladiaSTTService.Settings(
        realtime_processing=RealtimeProcessingConfig(
            custom_vocabulary=True,
            custom_vocabulary_config=CustomVocabularyConfig(
                vocabulary=[
                    CustomVocabularyItem(value="Pipecat", intensity=0.8),
                    "Gladia",
                ],
            ),
            translation=True,
            translation_config=TranslationConfig(
                target_languages=["fr", "de"],
                model="enhanced",
            ),
        ),
    ),
)

Notes

  • Session-based connection: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
  • Audio buffering: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
  • Keepalive: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
  • Built-in VAD: Set enable_vad=True in Settings to use Gladia’s server-side VAD, which emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame. When using this, do not enable another VAD in your pipeline.
  • Translation: Gladia supports real-time translation to multiple target languages. Translation results are pushed as TranslationFrames.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Gladia STT supports the standard service connection events:
EventDescription
on_connectedConnected to Gladia WebSocket
on_disconnectedDisconnected from Gladia WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gladia")