Skip to main content

Overview

MiniMaxTTSService provides high-quality text-to-speech synthesis using MiniMax’s T2A (Text-to-Audio) API with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.

Installation

To use MiniMax services, no additional dependencies are required beyond the base installation:
pip install "pipecat-ai"

Prerequisites

MiniMax Account Setup

Before using MiniMax TTS services, you need:
  1. MiniMax Account: Sign up at MiniMax Platform
  2. API Credentials: Get your API key and Group ID from the platform
  3. Voice Selection: Choose from available voice models and emotional settings

Required Environment Variables

  • MINIMAX_API_KEY: Your MiniMax API key for authentication
  • MINIMAX_GROUP_ID: Your MiniMax group ID

Configuration

MiniMaxHttpTTSService

api_key
str
required
MiniMax API key for authentication.
group_id
str
required
MiniMax Group ID to identify project.
voice_id
str
default:"Calm_Woman"
deprecated
Voice identifier for synthesis.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
model
str
default:"speech-02-turbo"
deprecated
TTS model name. Options include speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
base_url
str
default:"https://api.minimax.io/v1/t2a_v2"
API base URL. Use https://api.minimaxi.chat/v1/t2a_v2 for mainland China or https://api-uw.minimax.io/v1/t2a_v2 for western United States.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
deprecated
Runtime-configurable voice and generation settings. See InputParams below.Deprecated in v0.0.105. Use settings=MiniMaxHttpTTSService.Settings(...) instead.
settings
MiniMaxHttpTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using MiniMaxHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
speedfloatNOT_GIVENSpeech speed.
volumefloatNOT_GIVENVolume level.
pitchintNOT_GIVENPitch adjustment.
emotionstrNOT_GIVENEmotion for synthesis.
text_normalizationboolNOT_GIVENWhether to apply text normalization.
latex_readboolNOT_GIVENWhether to read LaTeX formulas.
language_booststrNOT_GIVENLanguage boost setting.

Usage

Basic Setup

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        aiohttp_session=session,
    )

With Voice Customization

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        aiohttp_session=session,
        settings=MiniMaxHttpTTSService.Settings(
            voice="Calm_Woman",
            model="speech-02-hd",
            language=Language.ZH,
            speed=1.2,
            emotion="happy",
        ),
    )
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • HTTP-based streaming: MiniMax uses an HTTP streaming API, not WebSocket. Audio data is returned in hex-encoded PCM chunks.
  • Emotional voice control: The emotion parameter lets you adjust the emotional tone of the voice without changing the voice model itself.
  • Model selection: The speech-2.6-* models are the latest and support additional languages (Filipino, Tamil, Persian). Use turbo variants for lower latency or hd variants for higher quality.
  • The Python class is named MiniMaxHttpTTSService, not MiniMaxTTSService.