Overview
FalSTTService provides speech-to-text capabilities using Fal’s Wizper API with Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time for efficient transcription.
Fal STT API Reference
Pipecat’s API methods for Fal Wizper integration
Example Implementation
Complete example with VAD integration
Fal Documentation
Official Fal Wizper documentation and features
Fal Platform
Access API keys and Wizper models
Installation
To use Fal services, install the required dependency:Prerequisites
Fal Account Setup
Before using Fal STT services, you need:- Fal Account: Sign up at Fal Platform
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure access to the Wizper transcription model
Required Environment Variables
FAL_KEY: Your Fal API key for authentication
Configuration
FalSTTService
Fal API key. If not provided, uses
FAL_KEY environment variable.Optional aiohttp ClientSession for HTTP requests. If not provided, a session
will be created and managed internally.
Task to perform (
"transcribe" or "translate").Level of chunking (
"segment").Version of the Wizper model to use.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Configuration parameters for the Wizper API. Deprecated in v0.0.105. Use
settings=FalSTTService.Settings(...) instead.Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
Settings
Runtime-configurable settings passed via thesettings constructor argument using FalSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | STT model identifier. (Inherited from base STT settings.) |
language | Language | str | Language.EN | Language of the audio input. (Inherited from base STT settings.) |
Usage
Basic Setup
With Custom Parameters
Translation Mode
Notes
- Segmented processing:
FalSTTServiceinherits fromSegmentedSTTService, which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment. - Translation support: Set
task="translate"to translate audio into English, regardless of the input language. - Wizper versions: The
versionparameter selects the underlying Whisper model version. Version"3"is the default and recommended for best accuracy.