Whisper

Overview

WhisperSTTService provides offline speech recognition using OpenAI’s Whisper models running locally. Supports multiple model sizes and hardware acceleration options including CPU, CUDA, and Apple Silicon (MLX) for privacy-focused transcription without external API calls.

Whisper STT API Reference

Pipecat’s API methods for Whisper STT integration

Standard Whisper Example

Complete example with standard Whisper

Whisper Documentation

OpenAI’s Whisper research paper and model details

MLX Whisper Example

Apple Silicon optimized example

Installation

Choose your installation based on your hardware:

Standard Whisper (CPU/CUDA)

pip install "pipecat-ai[whisper]"

MLX Whisper (Apple Silicon)

pip install "pipecat-ai[mlx-whisper]"

Prerequisites

Local Model Setup

Before using Whisper STT services, you need:

Model Selection: Choose appropriate Whisper model size (tiny, base, small, medium, large)
Hardware Configuration: Set up CPU, CUDA, or Apple Silicon acceleration
Storage Space: Ensure sufficient disk space for model downloads

Configuration Options

Model Size: Balance between accuracy and performance based on your hardware
Hardware Acceleration: Configure CUDA for NVIDIA GPUs or MLX for Apple Silicon
Language Support: Whisper supports 99+ languages out of the box

No API keys required - Whisper runs entirely locally for complete privacy.

Configuration

WhisperSTTService

Uses Faster Whisper for efficient local transcription on CPU or CUDA devices.

model

str | Model

default:"Model.DISTIL_MEDIUM_EN"

deprecated

Whisper model to use. Can be a Model enum value or a string. Available models: TINY, BASE, SMALL, MEDIUM, LARGE (large-v3), LARGE_V3_TURBO, DISTIL_LARGE_V2, DISTIL_MEDIUM_EN (English-only). Deprecated in v0.0.105. Use settings=WhisperSTTService.Settings(...) instead.

device

str

default:"auto"

Device for inference. Options: "cpu", "cuda", or "auto" (auto-detect).

compute_type

str

default:"default"

Compute type for inference. Options include "default", "int8", "int8_float16", "float16", etc.

no_speech_prob

float

default:"0.4"

deprecated

Probability threshold for filtering out non-speech segments. Segments with a no-speech probability above this value are excluded. Deprecated in v0.0.105. Use settings=WhisperSTTService.Settings(...) instead.

language

Language

default:"Language.EN"

deprecated

Default language for transcription. Deprecated in v0.0.105. Use settings=WhisperSTTService.Settings(...) instead.

settings

WhisperSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See WhisperSTTService Settings below.

WhisperSTTServiceMLX

Optimized for Apple Silicon using MLX Whisper. Models are loaded on demand.

model

str | MLXModel

default:"MLXModel.TINY"

deprecated

MLX Whisper model to use. Can be an MLXModel enum value or a string. Available models: TINY, MEDIUM, LARGE_V3, LARGE_V3_TURBO, DISTIL_LARGE_V3, LARGE_V3_TURBO_Q4 (quantized). Deprecated in v0.0.105. Use settings=WhisperSTTServiceMLX.Settings(...) instead.

no_speech_prob

float

default:"0.6"

deprecated

Probability threshold for filtering out non-speech segments. Deprecated in v0.0.105. Use settings=WhisperSTTServiceMLX.Settings(...) instead.

language

Language

default:"Language.EN"

deprecated

Default language for transcription. Deprecated in v0.0.105. Use settings=WhisperSTTServiceMLX.Settings(...) instead.

temperature

float

default:"0.0"

deprecated

Sampling temperature. Lower values produce more deterministic results. Deprecated in v0.0.105. Use settings=WhisperSTTServiceMLX.Settings(...) instead.

settings

WhisperSTTServiceMLX.Settings

default:"None"

Runtime-configurable settings for the MLX STT service. See WhisperSTTServiceMLX Settings below.

WhisperSTTService Settings

Runtime-configurable settings passed via the settings constructor argument using WhisperSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`Model.DISTIL_MEDIUM_EN`	Whisper model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Default language for transcription. (Inherited from base STT settings.)
`no_speech_prob`	`float`	`0.4`	Probability threshold for filtering out non-speech segments.

WhisperSTTServiceMLX Settings

Runtime-configurable settings passed via the settings constructor argument using WhisperSTTServiceMLX.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`MLXModel.TINY`	MLX Whisper model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Default language for transcription. (Inherited from base STT settings.)
`no_speech_prob`	`float`	`0.6`	Probability threshold for filtering out non-speech segments.
`temperature`	`float`	`0.0`	Sampling temperature. Lower values are more deterministic.
`engine`	`str`	`"mlx"`	Whisper engine identifier.

Usage

Basic Faster Whisper Setup

from pipecat.services.whisper.stt import WhisperSTTService

stt = WhisperSTTService(
    settings=WhisperSTTService.Settings(
        model="base",
    ),
)

With CUDA Acceleration

from pipecat.services.whisper.stt import WhisperSTTService, Model

stt = WhisperSTTService(
    device="cuda",
    compute_type="float16",
    settings=WhisperSTTService.Settings(
        model=Model.LARGE,
    ),
)

With Custom Language

from pipecat.services.whisper.stt import WhisperSTTService, Model
from pipecat.transcriptions.language import Language

stt = WhisperSTTService(
    settings=WhisperSTTService.Settings(
        model=Model.MEDIUM,
        language=Language.FR,
        no_speech_prob=0.5,
    ),
)

MLX Whisper on Apple Silicon

from pipecat.services.whisper.stt import WhisperSTTServiceMLX, MLXModel
from pipecat.transcriptions.language import Language

stt = WhisperSTTServiceMLX(
    settings=WhisperSTTServiceMLX.Settings(
        model=MLXModel.LARGE_V3_TURBO,
        language=Language.EN,
        temperature=0.0,
    ),
)

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

First run downloads: If the selected model hasn’t been downloaded previously, the first run will download it from the Hugging Face model hub. This may take significant time depending on model size.
Segmented transcription: Both WhisperSTTService and WhisperSTTServiceMLX extend SegmentedSTTService, meaning they process complete audio segments after VAD detects the user has stopped speaking.
No-speech filtering: The no_speech_prob threshold helps filter out hallucinations. Increase it to be more permissive, decrease it to filter more aggressively.
MLX quantization: The LARGE_V3_TURBO_Q4 model provides reduced memory usage with minimal quality loss on Apple Silicon.
Language support: Whisper supports 99+ languages. Use the Language enum for type-safe language selection.

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Whisper STT API Reference

Standard Whisper Example

Whisper Documentation

MLX Whisper Example

Installation

Standard Whisper (CPU/CUDA)

MLX Whisper (Apple Silicon)

Prerequisites

Local Model Setup

Configuration Options

Configuration

WhisperSTTService

WhisperSTTServiceMLX

WhisperSTTService Settings

WhisperSTTServiceMLX Settings

Usage

Basic Faster Whisper Setup

With CUDA Acceleration

With Custom Language

MLX Whisper on Apple Silicon

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Whisper STT API Reference

Standard Whisper Example

Whisper Documentation

MLX Whisper Example

​Installation

​Standard Whisper (CPU/CUDA)

​MLX Whisper (Apple Silicon)

​Prerequisites

​Local Model Setup

​Configuration Options

​Configuration

​WhisperSTTService

​WhisperSTTServiceMLX

​WhisperSTTService Settings

​WhisperSTTServiceMLX Settings

​Usage

​Basic Faster Whisper Setup

​With CUDA Acceleration

​With Custom Language

​MLX Whisper on Apple Silicon

​Notes

Overview

Installation

Standard Whisper (CPU/CUDA)

MLX Whisper (Apple Silicon)

Prerequisites

Local Model Setup

Configuration Options

Configuration

WhisperSTTService

WhisperSTTServiceMLX

WhisperSTTService Settings

WhisperSTTServiceMLX Settings

Usage

Basic Faster Whisper Setup

With CUDA Acceleration

With Custom Language

MLX Whisper on Apple Silicon

Notes