Skip to main content

Overview

NvidiaLLMService provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting and enterprise deployment.

Installation

To use NVIDIA NIM services, install the required dependencies:
pip install "pipecat-ai[nvidia]"

Prerequisites

NVIDIA NIM Setup

Before using NVIDIA NIM LLM services, you need:
  1. NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
  2. API Key: Generate an NVIDIA API key for NIM services
  3. Model Selection: Choose from available NIM-hosted models
  4. Enterprise Setup: Configure NIM for on-premises deployment if needed

Required Environment Variables

  • NVIDIA_API_KEY: Your NVIDIA API key for authentication

Configuration

api_key
str
required
NVIDIA API key for authentication.
base_url
str
default:"https://integrate.api.nvidia.com/v1"
Base URL for NIM API endpoint.
model
str
default:"None"
deprecated
Model identifier to use.Deprecated in v0.0.105. Use settings=NvidiaLLMService.Settings(model=...) instead.
settings
NvidiaLLMService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using NvidiaLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details. This service uses the same settings as OpenAILLMService. See OpenAI LLM Settings for the full parameter reference.

Usage

Basic Setup

import os
from pipecat.services.nvidia import NvidiaLLMService

llm = NvidiaLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    model="nvidia/llama-3.1-nemotron-70b-instruct",
)

With Custom Settings

from pipecat.services.nvidia import NvidiaLLMService

llm = NvidiaLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    settings=NvidiaLLMService.Settings(
        model="nvidia/llama-3.1-nemotron-70b-instruct",
        temperature=0.7,
        top_p=0.9,
        max_completion_tokens=1024,
    ),
)

Notes

  • NVIDIA NIM uses incremental token reporting. The service accumulates token usage metrics during processing and reports the final totals at the end of each request.
  • The legacy NimLLMService import from pipecat.services.nim is deprecated. Use NvidiaLLMService from pipecat.services.nvidia instead.
  • NIM supports both cloud-hosted and on-premises deployments. For on-premises, override the base_url to point to your local NIM endpoint.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.