Skip to main content

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

Installation

To use Google Gemini services, install the required dependencies:
pip install "pipecat-ai[google]"

Prerequisites

Google Gemini Setup

Before using Google Gemini LLM services, you need:
  1. Google Account: Sign up at Google AI Studio
  2. API Key: Generate a Gemini API key from AI Studio
  3. Model Selection: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)

Required Environment Variables

  • GOOGLE_API_KEY: Your Google Gemini API key for authentication

Configuration

api_key
str
required
Google AI API key for authentication.
model
str
default:"None"
deprecated
Gemini model name to use (e.g., "gemini-2.5-flash", "gemini-2.5-pro"). Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(...) instead.
settings
GoogleLLMService.Settings
default:"None"
Runtime-configurable model settings. See Settings below.
params
InputParams
default:"None"
deprecated
Runtime-configurable model settings. See Settings below. Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(...) instead.
system_instruction
str
default:"None"
deprecated
System instruction/prompt for the model. Sets the overall behavior and context. Deprecated in v0.0.105. Use settings=GoogleLLMService.Settings(system_instruction=...) instead.
tools
List[Dict[str, Any]]
default:"None"
List of available tools/functions for the model to call.
tool_config
Dict[str, Any]
default:"None"
Configuration for tool usage behavior.
http_options
HttpOptions
default:"None"
HTTP options for the Google API client.

Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneGemini model identifier. (Inherited from base settings.)
system_instructionstrNoneSystem instruction/prompt for the model. (Inherited from base settings.)
max_tokensintNOT_GIVENMaximum number of tokens to generate.
temperaturefloatNOT_GIVENSampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
top_kintNOT_GIVENTop-k sampling parameter. Limits tokens to the top k most likely.
top_pfloatNOT_GIVENTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
thinkingGoogleThinkingConfigNOT_GIVENThinking configuration. See GoogleThinkingConfig below.
NOT_GIVEN values are omitted from the API request, letting the Gemini API use its own defaults. If thinking is not provided, Pipecat disables thinking for Gemini 2.5 Flash models (where possible) to reduce latency.

GoogleThinkingConfig

Configuration for controlling the model’s internal thinking process. Gemini 2.5 and 3 series models support this feature.
ParameterTypeDefaultDescription
thinking_budgetintNoneToken budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768).
thinking_levelstrNoneThinking level for Gemini 3 models. "low", "high" for 3 Pro; "minimal", "low", "medium", "high" for 3 Flash.
include_thoughtsboolNoneWhether to include thought summaries in the response.
Gemini 2.5 series models use thinking_budget, while Gemini 3 models use thinking_level. Do not mix these parameters across model generations.

Usage

Basic Setup

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-flash",
)

With Custom Settings

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        system_instruction="You are a helpful assistant.",
        temperature=0.7,
        max_tokens=2048,
        top_p=0.9,
    ),
)

With Thinking Configuration

# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_budget=4096,
            include_thoughts=True,
        ),
    ),
)

# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-3-flash",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.google.llm import GoogleLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GoogleLLMSettings(
            temperature=0.3,
            max_tokens=1024,
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • System instruction priority: The system_instruction set via the constructor or GoogleLLMSettings takes priority over any system message in the context. If both are set, a warning is logged and the constructor/settings value is used.
  • Thinking defaults: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a GoogleThinkingConfig via settings.
  • Multimodal support: Gemini models natively support image and audio inputs through Google’s Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
  • Grounding with Google Search: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits LLMSearchResponseFrame with search results and source attributions.
  • Context format: The service automatically converts between OpenAI-style message formats and Google’s native Content/Part format, so you can use either.

Event Handlers

GoogleLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out (Google DeadlineExceeded)
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")