Skip to main content

Overview

NVIDIA Riva provides two STT service implementations:
  • NvidiaSTTService — Real-time streaming transcription using Parakeet models with interim results and continuous audio processing.
  • NvidiaSegmentedSTTService — Segmented transcription using Canary models with advanced language support, word boosting, and enterprise-grade accuracy.

Installation

To use NVIDIA Riva services, install the required dependency:
pip install "pipecat-ai[nvidia]"

Prerequisites

NVIDIA Riva Setup

Before using NVIDIA Riva STT services, you need:
  1. NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
  2. API Key: Generate an NVIDIA API key for Riva services
  3. Model Selection: Choose between Parakeet (streaming) and Canary (segmented) models

Required Environment Variables

  • NVIDIA_API_KEY: Your NVIDIA API key for authentication

NvidiaSTTService

Real-time streaming transcription using NVIDIA Riva’s Parakeet models.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
NvidiaSTTService.InputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=NvidiaSTTService.Settings(...) instead.
settings
NvidiaSTTService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.
ttfs_p99_latency
float
default:"1.0"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using NvidiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.EN_USTarget language for transcription. (Inherited from base STT settings.)

Usage

from pipecat.services.nvidia.stt import NvidiaSTTService

stt = NvidiaSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
)

Notes

  • Model cannot be changed after initialization: Use the model_function_map parameter in the constructor to specify the model and function ID.
  • Streaming: Provides real-time interim and final results through continuous audio streaming.

NvidiaSegmentedSTTService

Batch/segmented transcription using NVIDIA Riva’s Canary models. Processes complete audio segments after VAD detects speech boundaries.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
NvidiaSegmentedSTTService.InputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=NvidiaSegmentedSTTService.Settings(...) instead.
settings
NvidiaSegmentedSTTService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.
ttfs_p99_latency
float
default:"1.0"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using NvidiaSegmentedSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.EN_USTarget language for transcription. (Inherited from base STT settings.)
profanity_filterboolFalseWhether to filter profanity from results.
automatic_punctuationboolTrueWhether to add automatic punctuation.
verbatim_transcriptsboolFalseWhether to return verbatim transcripts.
boosted_lm_wordslist[str]NoneList of words to boost in the language model.
boosted_lm_scorefloat4.0Score boost for specified words.

Usage

from pipecat.services.nvidia.stt import NvidiaSegmentedSTTService
from pipecat.transcriptions.language import Language

stt = NvidiaSegmentedSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    settings=NvidiaSegmentedSTTService.Settings(
        language=Language.ES,
        automatic_punctuation=True,
        boosted_lm_words=["Pipecat", "NVIDIA"],
        boosted_lm_score=6.0,
    ),
)

Notes

  • Model cannot be changed after initialization: Use the model_function_map parameter in the constructor to specify the model and function ID.
  • Segmented processing: Processes complete audio segments for higher accuracy compared to streaming.
  • Language support: Supports Arabic, English (US/GB), French, German, Hindi, Italian, Japanese, Korean, Portuguese (BR), Russian, and Spanish (ES/US).
  • Word boosting: Use boosted_lm_words and boosted_lm_score to improve recognition of domain-specific terms.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.