Skip to main content

Overview

FalSTTService provides speech-to-text capabilities using Fal’s Wizper API with Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time for efficient transcription.

Installation

To use Fal services, install the required dependency:
pip install "pipecat-ai[fal]"

Prerequisites

Fal Account Setup

Before using Fal STT services, you need:
  1. Fal Account: Sign up at Fal Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the Wizper transcription model

Required Environment Variables

  • FAL_KEY: Your Fal API key for authentication

Configuration

FalSTTService

api_key
str
default:"None"
Fal API key. If not provided, uses FAL_KEY environment variable.
aiohttp_session
aiohttp.ClientSession
default:"None"
Optional aiohttp ClientSession for HTTP requests. If not provided, a session will be created and managed internally.
task
str
default:"transcribe"
Task to perform ("transcribe" or "translate").
chunk_level
str
default:"segment"
Level of chunking ("segment").
version
str
default:"3"
Version of the Wizper model to use.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
FalSTTService.InputParams
default:"None"
deprecated
Configuration parameters for the Wizper API. Deprecated in v0.0.105. Use settings=FalSTTService.Settings(...) instead.
settings
FalSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"FAL_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using FalSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.ENLanguage of the audio input. (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.fal.stt import FalSTTService

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
)

With Custom Parameters

from pipecat.services.fal.stt import FalSTTService
from pipecat.transcriptions.language import Language

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
    task="transcribe",
    version="3",
    settings=FalSTTService.Settings(
        language=Language.ES,
    ),
)

Translation Mode

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
    task="translate",
    settings=FalSTTService.Settings(
        language=Language.FR,
    ),
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Segmented processing: FalSTTService inherits from SegmentedSTTService, which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
  • Translation support: Set task="translate" to translate audio into English, regardless of the input language.
  • Wizper versions: The version parameter selects the underlying Whisper model version. Version "3" is the default and recommended for best accuracy.