Soniox

Overview

SonioxSTTService provides real-time speech-to-text transcription using Soniox’s WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription. By default, Soniox uses the stt-rt-v4 model with vad_force_turn_endpoint=True, which disables Soniox’s native turn detection and relies on Pipecat’s local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (~250ms median). Pipecat enables smart-turn detection by default using LocalSmartTurnAnalyzerV3. To use Soniox’s native turn detection instead, set vad_force_turn_endpoint=False.

Soniox STT API Reference

Pipecat’s API methods for Soniox STT integration

Example Implementation

Complete example with interruption handling

Soniox Documentation

Official Soniox documentation and features

Soniox Console

Access multilingual models and API keys

Installation

To use Soniox services, install the required dependencies:

pip install "pipecat-ai[soniox]"

Prerequisites

Soniox Account Setup

Before using Soniox STT services, you need:

Soniox Account: Sign up at Soniox Console
API Key: Generate an API key from your console dashboard
Language Selection: Choose from 60+ supported languages and models

Required Environment Variables

SONIOX_API_KEY: Your Soniox API key for authentication

Configuration

SonioxSTTService

api_key

str

required

Soniox API key for authentication.

url

str

default:"wss://stt-rt.soniox.com/transcribe-websocket"

Soniox WebSocket API URL.

sample_rate

int

default:"None"

Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

model

str

default:"None"

deprecated

Soniox model to use for transcription. Deprecated in v0.0.105. Use settings=SonioxSTTService.Settings(model=...) instead.

audio_format

str

default:"pcm_s16le"

Audio format for transcription. Init-only — not part of runtime-updatable settings.

num_channels

int

default:"1"

Number of audio channels. Init-only — not part of runtime-updatable settings.

params

SonioxInputParams

default:"None"

deprecated

Additional configuration parameters. Deprecated in v0.0.105. Use settings=SonioxSTTService.Settings(...) instead.

settings

SonioxSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

ttfs_p99_latency

float

default:"0.35"

P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

vad_force_turn_endpoint

bool

default:"True"

Listen to VADUserStoppedSpeakingFrame to send a finalize message to Soniox. When enabled, Pipecat’s local VAD triggers transcript finalization. When disabled, Soniox detects the end of speech natively.

Settings

Runtime-configurable settings passed via the settings constructor argument using SonioxSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"stt-rt-v4"`	Model to use for transcription. (Inherited from base STT settings.)
`language`	`Language \| str`	`None`	Language for speech recognition. (Inherited from base STT settings.)
`language_hints`	`list[Language]`	`None`	Language hints for transcription. Helps the model prioritize expected languages.
`language_hints_strict`	`bool`	`None`	If true, strictly enforce language hints (only transcribe in provided languages).
`context`	`SonioxContextObject \| str`	`None`	Customization for transcription. String for models with context_version 1, `SonioxContextObject` for context_version 2 (stt-rt-v3-preview and higher).
`enable_speaker_diarization`	`bool`	`False`	Enable speaker diarization. Tokens are annotated with speaker IDs.
`enable_language_identification`	`bool`	`False`	Enable language identification. Tokens are annotated with language IDs.
`client_reference_id`	`str`	`None`	Client reference ID for transcription tracking.

Usage

Basic Setup

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)

With Language Hints and Context

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        language_hints=[Language.EN, Language.ES],
        language_hints_strict=True,
        enable_language_identification=True,
    ),
)

With Context Object (v3+ models)

from pipecat.services.soniox.stt import (
    SonioxSTTService,
    SonioxContextObject,
    SonioxContextGeneralItem,
)

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        context=SonioxContextObject(
            general=[
                SonioxContextGeneralItem(key="domain", value="medical"),
            ],
            terms=["Pipecat", "transcription"],
        ),
    ),
)

With Soniox Native Turn Detection

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    vad_force_turn_endpoint=False,
)

Notes

Turn finalization: By default (vad_force_turn_endpoint=True), when Pipecat’s VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency.
Keepalive: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection.
Context versions: Use a string for context with older models (context_version 1) and SonioxContextObject for newer models (stt-rt-v3-preview and higher, context_version 2). See the Soniox context documentation for details.

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Soniox STT supports the standard service connection events:

Event	Description
`on_connected`	Connected to Soniox WebSocket
`on_disconnected`	Disconnected from Soniox WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Soniox STT API Reference

Example Implementation

Soniox Documentation

Soniox Console

Installation

Prerequisites

Soniox Account Setup

Required Environment Variables

Configuration

SonioxSTTService

Settings

Usage

Basic Setup

With Language Hints and Context

With Context Object (v3+ models)

With Soniox Native Turn Detection

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Soniox STT API Reference

Example Implementation

Soniox Documentation

Soniox Console

​Installation

​Prerequisites

​Soniox Account Setup

​Required Environment Variables

​Configuration

​SonioxSTTService

​Settings

​Usage

​Basic Setup

​With Language Hints and Context

​With Context Object (v3+ models)

​With Soniox Native Turn Detection

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Soniox Account Setup

Required Environment Variables

Configuration

SonioxSTTService

Settings

Usage

Basic Setup

With Language Hints and Context

With Context Object (v3+ models)

With Soniox Native Turn Detection

Notes

Event Handlers