Overview
Inworld provides high-quality, low-latency speech synthesis via two implementation types:InworldTTSService for real-time, minimal-latency use-cases through websockets and InworldHttpTTSService for streaming and non-streaming use-cases over HTTP. Featuring support for 12+ languages, timestamps, custom pronunciation and instant voice cloning.
Inworld TTS API Reference
Pipecat’s API methods for Inworld TTS integration
Example Implementation (Websockets)
Complete example with Inworld TTS
Inworld Documentation
Official Inworld TTS API documentation
Inworld Portal
Create and manage voice models
Installation
To use Inworld services, no additional dependencies are required beyond the base installation:Prerequisites
Inworld Account Setup
Before using Inworld TTS services, you need:- Inworld Account: Sign up at Inworld Studio
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose from available voice models
Required Environment Variables
INWORLD_API_KEY: Your Inworld API key for authentication
Configuration
InworldTTSService
WebSocket-based service for lowest latency streaming.Inworld API key.
ID of the voice to use for synthesis. Deprecated in v0.0.105. Use
settings=InworldTTSService.Settings(voice=...) instead.ID of the model to use for synthesis. Deprecated in v0.0.105. Use
settings=InworldTTSService.Settings(model=...) instead.URL of the Inworld WebSocket API.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Audio encoding format.
Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries, producing more natural
speech. TOKEN streams tokens directly for lower latency. Import from
pipecat.services.tts_service.Deprecated in v0.0.104. Use
text_aggregation_mode instead.Whether to append a trailing space to text before sending to TTS.
Deprecated in v0.0.105. Use
settings=InworldTTSService.Settings(...)
instead.Runtime-configurable settings. See InworldTTSService
Settings below.
InworldTTSService Settings
Runtime-configurable settings passed via thesettings constructor argument using InworldTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
speaking_rate | float | NOT_GIVEN | Speaking rate for speech synthesis. |
temperature | float | NOT_GIVEN | Temperature for speech synthesis. |
InworldHttpTTSService
HTTP-based service supporting both streaming and non-streaming modes.Inworld API key.
aiohttp ClientSession for HTTP requests.
ID of the voice to use for synthesis. Deprecated in v0.0.105. Use
settings=InworldHttpTTSService.Settings(voice=...) instead.ID of the model to use for synthesis. Deprecated in v0.0.105. Use
settings=InworldHttpTTSService.Settings(model=...) instead.Whether to use streaming mode.
Audio sample rate in Hz.
Audio encoding format.
Deprecated in v0.0.105. Use
settings=InworldHttpTTSService.Settings(...)
instead.Runtime-configurable settings. See InworldTTSService
Settings below.
Usage
Basic Setup (WebSocket)
With Custom Settings
HTTP Service
Notes
- WebSocket vs HTTP: The WebSocket service (
InworldTTSService) provides the lowest latency with bidirectional streaming and supports multiple independent audio contexts per connection (max 5). The HTTP service supports both streaming and non-streaming modes via thestreamingparameter. - Word timestamps: Both services provide word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn.
- Auto mode: When
auto_mode=True(default), the server controls flushing of buffered text for optimal latency and quality. This is recommended when text is sent in full sentences or phrases (i.e., when usingtext_aggregation_mode=TextAggregationMode.SENTENCE). - Keepalive: The WebSocket service sends periodic keepalive messages every 60 seconds to maintain the connection.
Event Handlers
Inworld TTS supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Inworld WebSocket |
on_disconnected | Disconnected from Inworld WebSocket |
on_connection_error | WebSocket connection error occurred |