Skip to main content

Overview

Hume provides expressive text-to-speech synthesis using their Octave models, which adapt pronunciation, pitch, speed, and emotional style based on context. HumeTTSService offers real-time streaming with word-level timestamps, custom voice support, and advanced synthesis controls including acting instructions, speed adjustment, and trailing silence configuration.

Installation

To use Hume services, install the required dependencies:
pip install "pipecat-ai[hume]"

Prerequisites

Hume Account Setup

Before using Hume TTS services, you need:
  1. Hume Account: Sign up at Hume AI
  2. API Key: Generate an API key from your account dashboard
  3. Voice Selection: Choose voice IDs from the voice library or create custom voices

Required Environment Variables

  • HUME_API_KEY: Your Hume API key for authentication

Configuration

HumeTTSService

api_key
str
default:"None"
Hume API key. If omitted, reads the HUME_API_KEY environment variable.
voice_id
str
required
deprecated
ID of the voice to use. Only voice IDs are supported; voice names are not.Deprecated in v0.0.105. Use settings=HumeTTSService.Settings(...) instead.
sample_rate
int
default:"48000"
Output sample rate for PCM frames. Hume TTS streams at 48kHz.
params
InputParams
default:"None"
deprecated
Runtime-configurable synthesis controls. See InputParams below.Deprecated in v0.0.105. Use settings=HumeTTSService.Settings(...) instead.
settings
HumeTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using HumeTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
descriptionstrNOT_GIVENDescription to guide voice synthesis.
speedfloatNOT_GIVENSpeech rate control.
trailing_silencefloatNOT_GIVENTrailing silence duration in seconds.

Usage

Basic Setup

from pipecat.services.hume import HumeTTSService

tts = HumeTTSService(
    api_key=os.getenv("HUME_API_KEY"),
    settings=HumeTTSService.Settings(
        voice="your-voice-id",
    ),
)

With Acting Directions

tts = HumeTTSService(
    api_key=os.getenv("HUME_API_KEY"),
    settings=HumeTTSService.Settings(
        voice="your-voice-id",
        description="Speak warmly and reassuringly",
        speed=1.1,
        trailing_silence=0.5,
    ),
)

Updating Settings at Runtime

Voice and synthesis parameters can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.hume.tts import HumeTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=HumeTTSSettings(
            speed=1.3,
            description="Speak with excitement",
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Fixed sample rate: Hume TTS streams at 48kHz. Setting a different sample_rate will produce a warning.
  • Word timestamps: The service provides word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn.
  • Description versions: When description is provided, the service uses Hume API version "1". Without a description, it uses the newer version "2".
  • Audio buffering: Audio is buffered internally until a minimum chunk size is reached before being pushed as frames, reducing audio glitches.