Overview
SarvamTTSService provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content. The bulbul:v3-beta model adds temperature control and 25 new speaker voices.
Sarvam TTS API Reference
Pipecat’s API methods for Sarvam AI TTS integration
Example Implementation
Complete example with Indian language support
Sarvam Documentation
Official Sarvam AI text-to-speech API documentation
Sarvam Console
Access Indian language voices and API keys
Installation
To use Sarvam AI services, no additional dependencies are required beyond the base installation:Prerequisites
Sarvam AI Account Setup
Before using Sarvam AI TTS services, you need:- Sarvam AI Account: Sign up at Sarvam AI Console
- API Key: Generate an API key from your account dashboard
- Language Selection: Choose from available Indian language voices
Required Environment Variables
SARVAM_API_KEY: Your Sarvam AI API key for authentication
Configuration
Sarvam offers two service implementations:SarvamTTSService (WebSocket) for real-time streaming and SarvamHttpTTSService (HTTP) for simpler batch synthesis.
SarvamTTSService
Sarvam AI API subscription key.
TTS model to use. Options:
bulbul:v2, bulbul:v3-beta, bulbul:v3.
Deprecated in v0.0.105. Use settings=SarvamTTSService.Settings(model=...)
instead.Speaker voice ID. If
None, uses the model-appropriate default (anushka for
v2, shubh for v3). Deprecated in v0.0.105. Use
settings=SarvamTTSService.Settings(voice=...) instead.WebSocket URL for the TTS backend.
Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries, producing more natural
speech. TOKEN streams tokens directly for lower latency. Import from
pipecat.services.tts_service.Deprecated in v0.0.104. Use
text_aggregation_mode instead.Audio sample rate in Hz (8000, 16000, 22050, 24000). If
None, uses
model-specific default (22050 for v2, 24000 for v3).Deprecated in v0.0.105. Use
settings=SarvamTTSService.Settings(...)
instead.Runtime-configurable settings. See SarvamTTSService
Settings below.
SarvamHttpTTSService
Sarvam AI API subscription key.
An aiohttp session for HTTP requests.
TTS model to use. Options:
bulbul:v2, bulbul:v3-beta, bulbul:v3.
Deprecated in v0.0.105. Use
settings=SarvamHttpTTSService.Settings(model=...) instead.Speaker voice ID. If
None, uses the model-appropriate default. Deprecated
in v0.0.105. Use settings=SarvamHttpTTSService.Settings(voice=...) instead.Sarvam AI API base URL.
Audio sample rate in Hz (8000, 16000, 22050, 24000). If
None, uses
model-specific default.Deprecated in v0.0.105. Use
settings=SarvamHttpTTSService.Settings(...)
instead.Runtime-configurable settings. See SarvamHttpTTSService
Settings below.
SarvamTTSService Settings
Runtime-configurable settings passed via thesettings constructor argument using SarvamTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
enable_preprocessing | bool | NOT_GIVEN | Enable text preprocessing. |
pace | float | NOT_GIVEN | Pace of speech. |
pitch | float | NOT_GIVEN | Pitch of speech. |
loudness | float | NOT_GIVEN | Loudness of speech. |
temperature | float | NOT_GIVEN | Temperature for speech synthesis. |
min_buffer_size | int | NOT_GIVEN | Minimum buffer size for WebSocket. |
max_chunk_length | int | NOT_GIVEN | Maximum chunk length for WebSocket. |
SarvamHttpTTSService Settings
Runtime-configurable settings passed via thesettings constructor argument using SarvamHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
enable_preprocessing | bool | NOT_GIVEN | Enable text preprocessing. |
pace | float | NOT_GIVEN | Pace of speech. |
pitch | float | NOT_GIVEN | Pitch of speech. |
loudness | float | NOT_GIVEN | Loudness of speech. |
temperature | float | NOT_GIVEN | Temperature for speech synthesis. |
Usage
Basic Setup (WebSocket)
With v3 Model and Temperature Control
HTTP Service
Notes
- Model differences:
bulbul:v2supports pitch and loudness control;bulbul:v3-betaandbulbul:v3add temperature control but do not support pitch or loudness. Setting unsupported parameters for a model will log a warning. - Default speakers vary by model: v2 defaults to
anushka; v3 models default toshubh. - Default sample rates vary by model: v2 defaults to 22050 Hz; v3 models default to 24000 Hz.
- Indian language focus: Sarvam AI specializes in Indian languages, supporting Bengali, English (India), Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
- Pace ranges differ:
bulbul:v2supports pace from 0.3 to 3.0, while v3 models support 0.5 to 2.0. Values outside the range are clamped automatically.
Event Handlers
Sarvam WebSocket TTS supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Sarvam WebSocket |
on_disconnected | Disconnected from Sarvam WebSocket |
on_connection_error | WebSocket connection error occurred |