Skip to main content

Overview

OpenAITTSService provides high-quality text-to-speech synthesis using OpenAI’s TTS API with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.

Installation

To use OpenAI services, install the required dependencies:
pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI TTS services, you need:
  1. OpenAI Account: Sign up at OpenAI Platform
  2. API Key: Generate an API key from your API keys page
  3. Voice Selection: Choose from available voice options (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse)

Required Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key for authentication

Configuration

OpenAITTSService

api_key
str
default:"None"
OpenAI API key for authentication. If None, uses the OPENAI_API_KEY environment variable.
base_url
str
default:"None"
Custom base URL for OpenAI API. If None, uses the default OpenAI endpoint.
voice
str
default:"alloy"
deprecated
Voice ID to use for synthesis. Options: alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
model
str
default:"gpt-4o-mini-tts"
deprecated
TTS model to use.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz. If None, uses OpenAI’s default 24kHz. OpenAI TTS only supports 24kHz output.
params
InputParams
default:"None"
deprecated
Runtime-configurable voice and generation settings. See InputParams below.Deprecated in v0.0.105. Use settings=OpenAITTSService.Settings(...) instead.
settings
OpenAITTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using OpenAITTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneTTS model identifier. (Inherited from base settings.)
voicestrNoneVoice identifier. (Inherited from base settings.)
languageLanguage | strNoneLanguage for synthesis. (Inherited from base settings.)
instructionsstrNOT_GIVENInstructions to guide voice synthesis behavior (e.g. affect, tone, pacing).
speedfloatNOT_GIVENVoice speed control (0.25 to 4.0).

Usage

Basic Setup

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAITTSService.Settings(
        voice="nova",
    ),
)

With Voice Customization

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAITTSService.Settings(
        voice="coral",
        model="gpt-4o-mini-tts",
        instructions="Speak in a warm, friendly tone with moderate pacing.",
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.openai.tts import OpenAITTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=OpenAITTSSettings(
            instructions="Now speak more formally.",
            speed=0.9,
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Fixed sample rate: OpenAI TTS always outputs audio at 24kHz. Using a different sample rate may cause issues.
  • Model selection: The gpt-4o-mini-tts model supports the instructions parameter for controlling voice affect and tone, which traditional TTS models do not support.
  • HTTP-based service: OpenAI TTS uses HTTP streaming, so it does not have WebSocket connection events.