Groq (Whisper)

Overview

GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.

Groq STT API Reference

Pipecat’s API methods for Groq STT integration

Example Implementation

Complete example with Groq ecosystem integration

Groq Documentation

Official Groq STT documentation and features

Groq Console

Access API keys and Whisper models

Installation

To use Groq services, install the required dependency:

pip install "pipecat-ai[groq]"

Prerequisites

Groq Account Setup

Before using Groq STT services, you need:

Groq Account: Sign up at Groq Console
API Key: Generate an API key from your console dashboard
Model Access: Ensure access to Whisper transcription models

Required Environment Variables

GROQ_API_KEY: Your Groq API key for authentication

Configuration

model

str

default:"whisper-large-v3-turbo"

deprecated

Whisper model to use for transcription. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.

api_key

str

default:"None"

Groq API key. If not provided, uses GROQ_API_KEY environment variable.

base_url

str

default:"https://api.groq.com/openai/v1"

API base URL. Override for custom or proxied deployments.

language

Language

default:"Language.EN"

deprecated

Language of the audio input. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.

prompt

str

default:"None"

deprecated

Optional text to guide the model’s style or continue a previous segment. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.

temperature

float

default:"None"

deprecated

Sampling temperature between 0 and 1. Lower values are more deterministic. Defaults to 0.0. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.

settings

GroqSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

ttfs_p99_latency

float

default:"GROQ_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

push_empty_transcripts

bool

default:"False"

If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.

Settings

Runtime-configurable settings passed via the settings constructor argument using GroqSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"whisper-large-v3-turbo"`	Whisper model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Language of the audio input. (Inherited from base STT settings.)
`prompt`	`str`	`None`	Optional text to guide the model’s style or continue a previous segment.
`temperature`	`float`	`None`	Sampling temperature between 0 and 1.

Usage

Basic Setup

from pipecat.services.groq.stt import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
)

With Custom Model and Language

from pipecat.services.groq.stt import GroqSTTService
from pipecat.transcriptions.language import Language

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    settings=GroqSTTService.Settings(
        model="whisper-large-v3-turbo",
        language=Language.ES,
    ),
)

With Prompt and Temperature

from pipecat.services.groq.stt import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    settings=GroqSTTService.Settings(
        prompt="This is a conversation about artificial intelligence and machine learning.",
        temperature=0.0,
    ),
)

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

Segmented processing: GroqSTTService inherits from SegmentedSTTService (via BaseWhisperSTTService), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
Whisper API compatible: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
Ultra-fast inference: Groq’s LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
Prompt guidance: Use the prompt parameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Groq STT API Reference

Example Implementation

Groq Documentation

Groq Console

Installation

Prerequisites

Groq Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Model and Language

With Prompt and Temperature

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Groq STT API Reference

Example Implementation

Groq Documentation

Groq Console

​Installation

​Prerequisites

​Groq Account Setup

​Required Environment Variables

​Configuration

​Settings

​Usage

​Basic Setup

​With Custom Model and Language

​With Prompt and Temperature

​Notes

Overview

Installation

Prerequisites

Groq Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Model and Language

With Prompt and Temperature

Notes