Skip to main content

Overview

ElevenLabs provides two STT service implementations:
  • ElevenLabsSTTService (HTTP) — File-based transcription using ElevenLabs’ Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.
  • ElevenLabsRealtimeSTTService (WebSocket) — Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.

Installation

To use ElevenLabs STT services, install the required dependencies:
pip install "pipecat-ai[elevenlabs]"

Prerequisites

ElevenLabs Account Setup

Before using ElevenLabs STT services, you need:
  1. ElevenLabs Account: Sign up at ElevenLabs Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the Scribe v2 transcription model (default: scribe_v2)

Required Environment Variables

  • ELEVENLABS_API_KEY: Your ElevenLabs API key for authentication

ElevenLabsSTTService

api_key
str
required
ElevenLabs API key for authentication.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
Base URL for the ElevenLabs API.
model
str
default:"scribe_v2"
deprecated
Model ID for transcription. Deprecated in v0.0.105. Use settings=ElevenLabsSTTService.Settings(...) instead.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
settings
ElevenLabsSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
params
ElevenLabsSTTService.InputParams
default:"None"
deprecated
Configuration parameters for the STT service. Deprecated in v0.0.105. Use settings=ElevenLabsSTTService.Settings(...) instead.
ttfs_p99_latency
float
default:"ELEVENLABS_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel ID for transcription. (Inherited from base STT settings.)
languageLanguage | strNoneTarget language for transcription. (Inherited from base STT settings.)
tag_audio_eventsboolTrueInclude audio events like (laughter), (coughing) in transcription.

Usage

import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
    )

With Language and Audio Events

import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
        settings=ElevenLabsSTTService.Settings(
            language=Language.ES,
            tag_audio_events=False,
        ),
    )

Notes

  • The HTTP service uploads complete audio segments and is best for VAD-segmented transcription.
  • Does not have connection events since it uses per-request HTTP calls.

ElevenLabsRealtimeSTTService

api_key
str
required
ElevenLabs API key for authentication.
base_url
str
default:"api.elevenlabs.io"
Base URL for the ElevenLabs WebSocket API.
model
str
default:"scribe_v2_realtime"
deprecated
Model ID for real-time transcription. Deprecated in v0.0.105. Use settings=ElevenLabsRealtimeSTTService.Settings(...) instead.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
settings
ElevenLabsRealtimeSTTService.Settings
default:"None"
Runtime-configurable settings for the Realtime STT service. See Settings below.
commit_strategy
CommitStrategy
default:"CommitStrategy.MANUAL"
How to segment speech. CommitStrategy.MANUAL uses Pipecat’s VAD to control when transcript segments are committed. CommitStrategy.VAD uses ElevenLabs’ built-in VAD for segment boundaries.
include_timestamps
bool
default:"False"
Whether to include word-level timestamps in transcripts.
enable_logging
bool
default:"False"
Whether to enable logging on ElevenLabs’ side.
include_language_detection
bool
default:"False"
Whether to include language detection in transcripts.
params
ElevenLabsRealtimeSTTService.InputParams
default:"None"
deprecated
Configuration parameters for the STT service. Deprecated in v0.0.105. Use settings=ElevenLabsRealtimeSTTService.Settings(...) instead.
ttfs_p99_latency
float
default:"ELEVENLABS_REALTIME_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsRealtimeSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel ID for transcription. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage for speech recognition. (Inherited from base STT settings.)
vad_silence_threshold_secsfloatNoneSeconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy.
vad_thresholdfloatNoneVAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy.
min_speech_duration_msintNoneMinimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy.
min_silence_duration_msintNoneMinimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy.

Usage

from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
)

With Timestamps and Custom Commit Strategy

from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService, CommitStrategy

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    language_code="eng",
    commit_strategy=CommitStrategy.VAD,
    include_timestamps=True,
    settings=ElevenLabsRealtimeSTTService.Settings(
        vad_silence_threshold_secs=1.0,
    ),
)

Notes

  • Commit strategies: Defaults to manual commit strategy, where Pipecat’s VAD controls when transcription segments are committed. Set commit_strategy=CommitStrategy.VAD to let ElevenLabs handle segment boundaries. When using MANUAL commit strategy, transcription frames are marked as finalized (TranscriptionFrame.finalized=True).
  • Keepalive: Sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
  • Auto-reconnect: Automatically reconnects if the WebSocket connection is closed when new audio arrives.

Event Handlers

Supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs Realtime STT WebSocket
on_disconnectedDisconnected from ElevenLabs Realtime STT WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs Realtime STT")
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.