Skip to main content

Overview

GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.

Installation

To use Groq services, install the required dependency:
pip install "pipecat-ai[groq]"

Prerequisites

Groq Account Setup

Before using Groq STT services, you need:
  1. Groq Account: Sign up at Groq Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Access: Ensure access to Whisper transcription models

Required Environment Variables

  • GROQ_API_KEY: Your Groq API key for authentication

Configuration

model
str
default:"whisper-large-v3-turbo"
deprecated
Whisper model to use for transcription. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.
api_key
str
default:"None"
Groq API key. If not provided, uses GROQ_API_KEY environment variable.
base_url
str
default:"https://api.groq.com/openai/v1"
API base URL. Override for custom or proxied deployments.
language
Language
default:"Language.EN"
deprecated
Language of the audio input. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.
prompt
str
default:"None"
deprecated
Optional text to guide the model’s style or continue a previous segment. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.
temperature
float
default:"None"
deprecated
Sampling temperature between 0 and 1. Lower values are more deterministic. Defaults to 0.0. Deprecated in v0.0.105. Use settings=GroqSTTService.Settings(...) instead.
settings
GroqSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"GROQ_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.
push_empty_transcripts
bool
default:"False"
If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.

Settings

Runtime-configurable settings passed via the settings constructor argument using GroqSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"whisper-large-v3-turbo"Whisper model to use. (Inherited from base STT settings.)
languageLanguage | strLanguage.ENLanguage of the audio input. (Inherited from base STT settings.)
promptstrNoneOptional text to guide the model’s style or continue a previous segment.
temperaturefloatNoneSampling temperature between 0 and 1.

Usage

Basic Setup

from pipecat.services.groq.stt import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
)

With Custom Model and Language

from pipecat.services.groq.stt import GroqSTTService
from pipecat.transcriptions.language import Language

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    settings=GroqSTTService.Settings(
        model="whisper-large-v3-turbo",
        language=Language.ES,
    ),
)

With Prompt and Temperature

from pipecat.services.groq.stt import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    settings=GroqSTTService.Settings(
        prompt="This is a conversation about artificial intelligence and machine learning.",
        temperature=0.0,
    ),
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Segmented processing: GroqSTTService inherits from SegmentedSTTService (via BaseWhisperSTTService), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
  • Whisper API compatible: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
  • Ultra-fast inference: Groq’s LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
  • Prompt guidance: Use the prompt parameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.