OpenAI

Overview

OpenAI provides two STT service implementations:

OpenAISTTService (HTTP) — VAD-segmented speech recognition using OpenAI’s transcription API, supporting GPT-4o transcription and Whisper models.
OpenAIRealtimeSTTService (WebSocket) — Real-time streaming speech-to-text using OpenAI’s Realtime API transcription sessions, with support for local VAD and server-side VAD modes.

OpenAI STT API Reference

Pipecat’s API methods for OpenAI STT integration

Example Implementation

Complete example with OpenAI ecosystem integration

OpenAI Documentation

Official OpenAI transcription documentation and features

OpenAI Platform

Access API keys and transcription models

Installation

To use OpenAI services, install the required dependency:

pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI STT services, you need:

OpenAI Account: Sign up at OpenAI Platform
API Key: Generate an API key from your account dashboard
Model Access: Ensure access to GPT-4o transcription and Whisper models

Required Environment Variables

OPENAI_API_KEY: Your OpenAI API key for authentication

OpenAISTTService

Uses VAD-based audio segmentation with HTTP transcription requests. Records speech segments detected by local VAD and sends them to OpenAI’s transcription API.

model

str

default:"gpt-4o-transcribe"

deprecated

Transcription model to use. Options include "gpt-4o-transcribe", "gpt-4o-mini-transcribe", and "whisper-1". Deprecated in v0.0.105. Use settings=OpenAISTTService.Settings(...) instead.

api_key

str

default:"None"

OpenAI API key. Falls back to the OPENAI_API_KEY environment variable.

base_url

str

default:"None"

API base URL. Override for custom or proxied deployments.

language

Language

default:"Language.EN"

deprecated

Language of the audio input. Deprecated in v0.0.105. Use settings=OpenAISTTService.Settings(...) instead.

prompt

str

default:"None"

deprecated

Optional text to guide the model’s style or continue a previous segment. Deprecated in v0.0.105. Use settings=OpenAISTTService.Settings(...) instead.

temperature

float

default:"None"

deprecated

Sampling temperature between 0 and 1. Lower values produce more deterministic results. Deprecated in v0.0.105. Use settings=OpenAISTTService.Settings(...) instead.

settings

OpenAISTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

ttfs_p99_latency

float

default:"OPENAI_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

push_empty_transcripts

bool

default:"False"

If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAISTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"gpt-4o-transcribe"`	Transcription model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Language of the audio input. (Inherited from base STT settings.)
`prompt`	`str`	`None`	Optional text to guide the model’s style or continue a previous segment.
`temperature`	`float`	`None`	Sampling temperature between 0 and 1.

Usage

from pipecat.services.openai.stt import OpenAISTTService

stt = OpenAISTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAISTTService.Settings(
        model="gpt-4o-transcribe",
    ),
)

Notes

Segmented transcription: Processes complete audio segments (after VAD detects silence) via HTTP. Only produces final transcriptions, not interim results.
Does not have WebSocket connection events since it uses per-request HTTP calls.

OpenAIRealtimeSTTService

Real-time streaming speech-to-text using OpenAI’s Realtime API WebSocket transcription sessions. Audio is streamed continuously over a WebSocket connection for lower latency compared to HTTP-based transcription.

api_key

str

required

OpenAI API key for authentication.

model

str

default:"gpt-4o-transcribe"

deprecated

Transcription model. Supported values are "gpt-4o-transcribe" and "gpt-4o-mini-transcribe". Deprecated in v0.0.105. Use settings=OpenAIRealtimeSTTService.Settings(...) instead.

base_url

str

default:"wss://api.openai.com/v1/realtime"

WebSocket base URL for the Realtime API.

language

Language

default:"Language.EN"

deprecated

Language of the audio input. Deprecated in v0.0.105. Use settings=OpenAIRealtimeSTTService.Settings(...) instead.

prompt

str

default:"None"

deprecated

Optional prompt text to guide transcription style or provide keyword hints. Deprecated in v0.0.105. Use settings=OpenAIRealtimeSTTService.Settings(...) instead.

settings

OpenAIRealtimeSTTService.Settings

default:"None"

Runtime-configurable settings for the Realtime STT service. See Settings below.

turn_detection

dict | Literal[False]

default:"False"

Server-side VAD configuration. Defaults to False (disabled), which relies on a local VAD processor in the pipeline. Pass None to use server defaults (server_vad), or a dict with custom settings (e.g. {"type": "server_vad", "threshold": 0.5}).

noise_reduction

str

default:"None"

Noise reduction mode. "near_field" for close microphones, "far_field" for distant microphones, or None to disable.

should_interrupt

bool

default:"True"

Whether to interrupt bot output when speech is detected by server-side VAD. Only applies when turn detection is enabled.

ttfs_p99_latency

float

default:"OPENAI_REALTIME_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAIRealtimeSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"gpt-4o-transcribe"`	Transcription model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Language of the audio input. (Inherited from base STT settings.)
`prompt`	`str`	`None`	Optional prompt text to guide transcription style or keyword hints.

Usage

With Local VAD

from pipecat.services.openai.stt import OpenAIRealtimeSTTService

# Local VAD mode (default) - use with a VAD processor in the pipeline
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    noise_reduction="near_field",
    settings=OpenAIRealtimeSTTService.Settings(
        model="gpt-4o-transcribe",
    ),
)

With Server-Side VAD

from pipecat.services.openai.stt import OpenAIRealtimeSTTService

# Server-side VAD mode - do NOT use a separate VAD processor
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    turn_detection=None,  # Enable server-side VAD
    settings=OpenAIRealtimeSTTService.Settings(
        model="gpt-4o-transcribe",
    ),
)

Notes

Local VAD vs Server-side VAD: Defaults to local VAD mode (turn_detection=False), where a local VAD processor in the pipeline controls when audio is committed for transcription. Set turn_detection=None for server-side VAD, but do not use a separate VAD processor in the pipeline in that mode.
Automatic resampling: Automatically resamples audio to 24 kHz as required by the Realtime API, regardless of the pipeline’s sample rate.
Interim transcriptions: Produces interim transcriptions via delta events for real-time feedback.

Event Handlers

Supports the standard service connection events:

Event	Description
`on_connected`	Connected to OpenAI Realtime WebSocket
`on_disconnected`	Disconnected from OpenAI Realtime WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to OpenAI Realtime STT")

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

OpenAI STT API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

OpenAISTTService

Settings

Usage

Notes

OpenAIRealtimeSTTService

Settings

Usage

With Local VAD

With Server-Side VAD

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

OpenAI STT API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

​Installation

​Prerequisites

​OpenAI Account Setup

​Required Environment Variables

​OpenAISTTService

​Settings

​Usage

​Notes

​OpenAIRealtimeSTTService

​Settings

​Usage

​With Local VAD

​With Server-Side VAD

​Notes

​Event Handlers

Overview

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

OpenAISTTService

Settings

Usage

Notes

OpenAIRealtimeSTTService

Settings

Usage

With Local VAD

With Server-Side VAD

Notes

Event Handlers