Skip to main content

Overview

SonioxSTTService provides real-time speech-to-text transcription using Soniox’s WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription. By default, Soniox uses the stt-rt-v4 model with vad_force_turn_endpoint=True, which disables Soniox’s native turn detection and relies on Pipecat’s local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (~250ms median). Pipecat enables smart-turn detection by default using LocalSmartTurnAnalyzerV3. To use Soniox’s native turn detection instead, set vad_force_turn_endpoint=False.

Installation

To use Soniox services, install the required dependencies:
pip install "pipecat-ai[soniox]"

Prerequisites

Soniox Account Setup

Before using Soniox STT services, you need:
  1. Soniox Account: Sign up at Soniox Console
  2. API Key: Generate an API key from your console dashboard
  3. Language Selection: Choose from 60+ supported languages and models

Required Environment Variables

  • SONIOX_API_KEY: Your Soniox API key for authentication

Configuration

SonioxSTTService

api_key
str
required
Soniox API key for authentication.
url
str
default:"wss://stt-rt.soniox.com/transcribe-websocket"
Soniox WebSocket API URL.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"None"
deprecated
Soniox model to use for transcription. Deprecated in v0.0.105. Use settings=SonioxSTTService.Settings(model=...) instead.
audio_format
str
default:"pcm_s16le"
Audio format for transcription. Init-only — not part of runtime-updatable settings.
num_channels
int
default:"1"
Number of audio channels. Init-only — not part of runtime-updatable settings.
params
SonioxInputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=SonioxSTTService.Settings(...) instead.
settings
SonioxSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"0.35"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.
vad_force_turn_endpoint
bool
default:"True"
Listen to VADUserStoppedSpeakingFrame to send a finalize message to Soniox. When enabled, Pipecat’s local VAD triggers transcript finalization. When disabled, Soniox detects the end of speech natively.

Settings

Runtime-configurable settings passed via the settings constructor argument using SonioxSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"stt-rt-v4"Model to use for transcription. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage for speech recognition. (Inherited from base STT settings.)
language_hintslist[Language]NoneLanguage hints for transcription. Helps the model prioritize expected languages.
language_hints_strictboolNoneIf true, strictly enforce language hints (only transcribe in provided languages).
contextSonioxContextObject | strNoneCustomization for transcription. String for models with context_version 1, SonioxContextObject for context_version 2 (stt-rt-v3-preview and higher).
enable_speaker_diarizationboolFalseEnable speaker diarization. Tokens are annotated with speaker IDs.
enable_language_identificationboolFalseEnable language identification. Tokens are annotated with language IDs.
client_reference_idstrNoneClient reference ID for transcription tracking.

Usage

Basic Setup

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)

With Language Hints and Context

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        language_hints=[Language.EN, Language.ES],
        language_hints_strict=True,
        enable_language_identification=True,
    ),
)

With Context Object (v3+ models)

from pipecat.services.soniox.stt import (
    SonioxSTTService,
    SonioxContextObject,
    SonioxContextGeneralItem,
)

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        context=SonioxContextObject(
            general=[
                SonioxContextGeneralItem(key="domain", value="medical"),
            ],
            terms=["Pipecat", "transcription"],
        ),
    ),
)

With Soniox Native Turn Detection

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    vad_force_turn_endpoint=False,
)

Notes

  • Turn finalization: By default (vad_force_turn_endpoint=True), when Pipecat’s VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency.
  • Keepalive: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection.
  • Context versions: Use a string for context with older models (context_version 1) and SonioxContextObject for newer models (stt-rt-v3-preview and higher, context_version 2). See the Soniox context documentation for details.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Soniox STT supports the standard service connection events:
EventDescription
on_connectedConnected to Soniox WebSocket
on_disconnectedDisconnected from Soniox WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox")