Gemini Live

Overview

GeminiLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.

Want to start building? Check out our Gemini Live Guide.

Gemini Live API Reference

Pipecat’s API methods for Gemini Live integration

Example Implementation

Complete Gemini Live function calling example

Gemini Documentation

Official Google Gemini Live API documentation

Gemini Live Model Card

Gemini Live available models

Installation

To use Gemini Live services, install the required dependencies:

pip install "pipecat-ai[google]"

Prerequisites

Google AI Setup

Before using Gemini Live services, you need:

Google Account: Set up at Google AI Studio
API Key: Generate a Gemini API key from AI Studio
Model Access: Ensure access to Gemini Live models
Multimodal Configuration: Set up audio, video, and text modalities

Required Environment Variables

GOOGLE_API_KEY: Your Google Gemini API key for authentication

Key Features

Multimodal Processing: Handle audio, video, and text inputs simultaneously
Real-time Streaming: Low-latency audio and video processing
Voice Activity Detection: Automatic speech detection and turn management
Function Calling: Advanced tool integration and API calling capabilities
Context Management: Intelligent conversation history and system instruction handling

Configuration

GeminiLiveLLMService

api_key

str

required

Google AI API key for authentication.

model

str

deprecated

Gemini model identifier to use.Deprecated in v0.0.105. Use settings=GeminiLiveLLMService.Settings(model=...) instead.

voice_id

str

default:"Charon"

deprecated

TTS voice identifier for audio responses.Deprecated in v0.0.105. Use settings=GeminiLiveLLMService.Settings(voice=...) instead.

system_instruction

str

default:"None"

System prompt for the model. Can also be provided via the LLM context.

tools

List[dict] | ToolsSchema

default:"None"

Tools/functions available to the model. Can also be provided via the LLM context.

params

InputParams

default:"InputParams()"

deprecated

Runtime-configurable generation and session settings. See InputParams below.Deprecated in v0.0.105. Use settings=GeminiLiveLLMService.Settings(...) instead.

settings

GeminiLiveLLMService.Settings

default:"None"

Runtime-configurable settings. See Settings below.

start_audio_paused

bool

default:"False"

Whether to start with audio input paused.

start_video_paused

bool

default:"False"

Whether to start with video input paused.

inference_on_context_initialization

bool

default:"True"

Whether to generate a response when context is first set. Set to False to wait for user input before the model responds.

http_options

HttpOptions

default:"None"

HTTP options for the Google API client. Use this to set API version (e.g. HttpOptions(api_version="v1alpha")) or other request options.

file_api_base_url

str

Base URL for the Gemini File API.

Settings

Runtime-configurable settings passed via the settings constructor argument using GeminiLiveLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`NOT_GIVEN`	Model identifier. (Inherited from base settings.)
`system_instruction`	`str`	`NOT_GIVEN`	System instruction/prompt. (Inherited from base settings.)
`temperature`	`float`	`NOT_GIVEN`	Sampling temperature (0.0-2.0). (Inherited from base settings.)
`max_tokens`	`int`	`NOT_GIVEN`	Maximum tokens to generate. (Inherited from base settings.)
`top_k`	`int`	`NOT_GIVEN`	Top-k sampling parameter. (Inherited from base settings.)
`top_p`	`float`	`NOT_GIVEN`	Top-p (nucleus) sampling parameter (0.0-1.0). (Inherited from base settings.)
`frequency_penalty`	`float`	`NOT_GIVEN`	Frequency penalty for generation (0.0-2.0). (Inherited from base settings.)
`presence_penalty`	`float`	`NOT_GIVEN`	Presence penalty for generation (0.0-2.0). (Inherited from base settings.)
`voice`	`str`	`NOT_GIVEN`	TTS voice identifier (e.g. `"Charon"`, `"Puck"`).
`modalities`	`GeminiModalities`	`NOT_GIVEN`	Response modality: `GeminiModalities.AUDIO` or `GeminiModalities.TEXT`.
`language`	`Language \| str`	`NOT_GIVEN`	Language for generation and transcription.
`media_resolution`	`GeminiMediaResolution`	`NOT_GIVEN`	Media resolution for video input: `UNSPECIFIED`, `LOW`, `MEDIUM`, or `HIGH`.
`vad`	`GeminiVADParams`	`NOT_GIVEN`	Voice activity detection parameters. See GeminiVADParams below.
`context_window_compression`	`ContextWindowCompressionParams \| dict`	`NOT_GIVEN`	Context window compression settings.
`thinking`	`ThinkingConfig \| dict`	`NOT_GIVEN`	Thinking/reasoning configuration. Requires a model that supports it.
`enable_affective_dialog`	`bool`	`NOT_GIVEN`	Enable affective dialog for expression and tone adaptation.
`proactivity`	`ProactivityConfig \| dict`	`NOT_GIVEN`	Proactivity settings for model behavior.

NOT_GIVEN values are omitted, letting the service use its own defaults (e.g. "models/gemini-2.5-flash-native-audio-preview-12-2025" for model, "Charon" for voice, 4096 for max_tokens). Only parameters that are explicitly set are included.

GeminiVADParams

Voice activity detection configuration passed via the vad Settings field:

Parameter	Type	Default	Description
`disabled`	`bool`	`None`	Whether to disable server-side VAD entirely.
`start_sensitivity`	`StartSensitivity`	`None`	Sensitivity for speech start detection.
`end_sensitivity`	`EndSensitivity`	`None`	Sensitivity for speech end detection.
`prefix_padding_ms`	`int`	`None`	Padding before speech starts in milliseconds.
`silence_duration_ms`	`int`	`None`	Silence duration threshold in milliseconds to detect speech end.

ContextWindowCompressionParams

Parameter	Type	Default	Description
`enabled`	`bool`	`False`	Whether context window compression is enabled.
`trigger_tokens`	`int`	`None`	Token count to trigger compression. `None` uses the default (80% of context window).

Usage

Basic Setup

import os
from pipecat.services.google.gemini_live import GeminiLiveLLMService

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GeminiLiveLLMService.Settings(
        voice="Charon",
        system_instruction="You are a helpful assistant.",
    ),
)

With Settings

from pipecat.services.google.gemini_live import (
    GeminiLiveLLMService,
    GeminiVADParams,
    ContextWindowCompressionParams,
)

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GeminiLiveLLMService.Settings(
        model="models/gemini-2.5-flash-native-audio-preview-12-2025",
        system_instruction="You are a helpful assistant.",
        voice="Puck",
        temperature=0.7,
        max_tokens=2048,
        language="en-US",
        vad=GeminiVADParams(
            silence_duration_ms=500,
        ),
        context_window_compression={"enabled": True},
    ),
)

Text-Only Mode

from pipecat.services.google.gemini_live import (
    GeminiLiveLLMService,
    GeminiModalities,
)

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GeminiLiveLLMService.Settings(
        system_instruction="You are a helpful assistant.",
        modalities=GeminiModalities.TEXT,
    ),
)

With Thinking Enabled

from google.genai.types import ThinkingConfig

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GeminiLiveLLMService.Settings(
        model="models/gemini-2.5-flash-native-audio-preview-12-2025",
        system_instruction="You are a helpful assistant.",
        thinking=ThinkingConfig(include_thoughts=True),
    ),
)

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

System instruction precedence: If a system instruction is provided both at init time and in the LLM context, the context-provided value takes precedence.
Tools precedence: Similarly, tools provided in the context override tools provided at init time.
Transcription aggregation: Gemini Live sends user transcriptions in small chunks. The service aggregates them into complete sentences using end-of-sentence detection with a 0.5-second timeout fallback.
Session resumption: The service automatically handles session resumption on reconnection using session resumption handles.
Connection resilience: The service will attempt up to 3 consecutive reconnections before treating a connection failure as fatal.
Video frame rate: Video frames are throttled to a maximum of one per second.
Affective dialog and proactivity: These features require both a supporting model and API version (v1alpha).

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Gemini Live API Reference

Example Implementation

Gemini Documentation

Gemini Live Model Card

Installation

Prerequisites

Google AI Setup

Required Environment Variables

Key Features

Configuration

GeminiLiveLLMService

Settings

GeminiVADParams

ContextWindowCompressionParams

Usage

Basic Setup

With Settings

Text-Only Mode

With Thinking Enabled

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Gemini Live API Reference

Example Implementation

Gemini Documentation

Gemini Live Model Card

​Installation

​Prerequisites

​Google AI Setup

​Required Environment Variables

​Key Features

​Configuration

​GeminiLiveLLMService

​Settings

​GeminiVADParams

​ContextWindowCompressionParams

​Usage

​Basic Setup

​With Settings

​Text-Only Mode

​With Thinking Enabled

​Notes

Overview

Installation

Prerequisites

Google AI Setup

Required Environment Variables

Key Features

Configuration

GeminiLiveLLMService

Settings

GeminiVADParams

ContextWindowCompressionParams

Usage

Basic Setup

With Settings

Text-Only Mode

With Thinking Enabled

Notes