Google Gemini

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

Gemini LLM API Reference

Pipecat’s API methods for Google Gemini integration

Example Implementation

Complete example with function calling

Gemini Documentation

Official Google Gemini API documentation and features

Google AI Studio

Access Gemini models and manage API keys

Installation

To use Google Gemini services, install the required dependencies:

pip install "pipecat-ai[google]"

Prerequisites

Google Gemini Setup

Before using Google Gemini LLM services, you need:

Google Account: Sign up at Google AI Studio
API Key: Generate a Gemini API key from AI Studio
Model Selection: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)

Required Environment Variables

GOOGLE_API_KEY: Your Google Gemini API key for authentication

Configuration

api_key

str

required

Google AI API key for authentication.

model

str

default:"None"

deprecated

Gemini model name to use (e.g., "gemini-2.5-flash", "gemini-2.5-pro"). Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(...) instead.

settings

GoogleLLMService.Settings

default:"None"

Runtime-configurable model settings. See Settings below.

params

InputParams

default:"None"

deprecated

Runtime-configurable model settings. See Settings below. Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(...) instead.

system_instruction

str

default:"None"

deprecated

System instruction/prompt for the model. Sets the overall behavior and context. Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(system_instruction=...) instead.

tools

List[Dict[str, Any]]

default:"None"

List of available tools/functions for the model to call.

tool_config

Dict[str, Any]

default:"None"

Configuration for tool usage behavior.

http_options

HttpOptions

default:"None"

HTTP options for the Google API client.

Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Gemini model identifier. (Inherited from base settings.)
`system_instruction`	`str`	`None`	System instruction/prompt for the model. (Inherited from base settings.)
`max_tokens`	`int`	`NOT_GIVEN`	Maximum number of tokens to generate.
`temperature`	`float`	`NOT_GIVEN`	Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
`top_k`	`int`	`NOT_GIVEN`	Top-k sampling parameter. Limits tokens to the top k most likely.
`top_p`	`float`	`NOT_GIVEN`	Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
`thinking`	`GoogleThinkingConfig`	`NOT_GIVEN`	Thinking configuration. See GoogleThinkingConfig below.

NOT_GIVEN values are omitted from the API request, letting the Gemini API use its own defaults. If thinking is not provided, Pipecat disables thinking for Gemini 2.5 Flash models (where possible) to reduce latency.

GoogleThinkingConfig

Configuration for controlling the model’s internal thinking process. Gemini 2.5 and 3 series models support this feature.

Parameter	Type	Default	Description
`thinking_budget`	`int`	`None`	Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768).
`thinking_level`	`str`	`None`	Thinking level for Gemini 3 models. `"low"`, `"high"` for 3 Pro; `"minimal"`, `"low"`, `"medium"`, `"high"` for 3 Flash.
`include_thoughts`	`bool`	`None`	Whether to include thought summaries in the response.

Gemini 2.5 series models use thinking_budget, while Gemini 3 models use thinking_level. Do not mix these parameters across model generations.

Usage

Basic Setup

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-flash",
)

With Custom Settings

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        system_instruction="You are a helpful assistant.",
        temperature=0.7,
        max_tokens=2048,
        top_p=0.9,
    ),
)

With Thinking Configuration

# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_budget=4096,
            include_thoughts=True,
        ),
    ),
)

# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-3-flash",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.google.llm import GoogleLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GoogleLLMSettings(
            temperature=0.3,
            max_tokens=1024,
        )
    )
)

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

System instruction priority: The system_instruction set via the constructor or GoogleLLMSettings takes priority over any system message in the context. If both are set, a warning is logged and the constructor/settings value is used.
Thinking defaults: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a GoogleThinkingConfig via settings.
Multimodal support: Gemini models natively support image and audio inputs through Google’s Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
Grounding with Google Search: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits LLMSearchResponseFrame with search results and source attributions.
Context format: The service automatically converts between OpenAI-style message formats and Google’s native Content/Part format, so you can use either.

Event Handlers

GoogleLLMService supports the following event handlers, inherited from LLMService:

Event	Description
`on_completion_timeout`	Called when an LLM completion request times out (Google `DeadlineExceeded`)
`on_function_calls_started`	Called when function calls are received and execution is about to start

@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Gemini LLM API Reference

Example Implementation

Gemini Documentation

Google AI Studio

Installation

Prerequisites

Google Gemini Setup

Required Environment Variables

Configuration

Settings

GoogleThinkingConfig

Usage

Basic Setup

With Custom Settings

With Thinking Configuration

Updating Settings at Runtime

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Gemini LLM API Reference

Example Implementation

Gemini Documentation

Google AI Studio

​Installation

​Prerequisites

​Google Gemini Setup

​Required Environment Variables

​Configuration

​Settings

​GoogleThinkingConfig

​Usage

​Basic Setup

​With Custom Settings

​With Thinking Configuration

​Updating Settings at Runtime

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Google Gemini Setup

Required Environment Variables

Configuration

Settings

GoogleThinkingConfig

Usage

Basic Setup

With Custom Settings

With Thinking Configuration

Updating Settings at Runtime

Notes

Event Handlers