Skip to main content

Overview

GrokRealtimeLLMService provides real-time, multimodal conversation capabilities using xAI’s Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.

Installation

To use Grok Realtime services, install the required dependencies:
pip install "pipecat-ai[grok]"

Prerequisites

xAI Account Setup

Before using Grok Realtime services, you need:
  1. xAI Account: Sign up at xAI Console
  2. API Key: Generate a Grok API key from your account dashboard
  3. Model Access: Ensure access to Grok Voice Agent models
  4. Usage Limits: Configure appropriate usage limits and billing

Required Environment Variables

  • GROK_API_KEY: Your xAI API key for authentication

Key Features

  • Real-time Speech-to-Speech: Direct audio processing with low latency
  • Multilingual Support: Support for multiple languages
  • Voice Activity Detection: Server-side VAD for automatic speech detection
  • Function Calling: Seamless support for external functions and tool integration
  • Multiple Voice Options: Various voice personalities available
  • WebSocket Support: Real-time bidirectional audio streaming

Configuration

GrokRealtimeLLMService

api_key
str
required
xAI API key for authentication.
base_url
str
default:"wss://api.x.ai/v1/realtime"
WebSocket base URL for the Grok Realtime API. Override for custom deployments.
session_properties
SessionProperties
default:"None"
deprecated
Configuration properties for the realtime session. If None, uses default SessionProperties with voice "Ara" and server-side VAD enabled. See SessionProperties below.Deprecated in v0.0.105. Use settings=GrokRealtimeLLMService.Settings(session_properties=...) instead.
settings
GrokRealtimeLLMService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
start_audio_paused
bool
default:"False"
Whether to start with audio input paused.

Settings

Runtime-configurable settings passed via the settings constructor argument using GrokRealtimeLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNOT_GIVENModel identifier. (Inherited from base settings.)
system_instructionstrNOT_GIVENSystem instruction/prompt. (Inherited from base settings.)
session_propertiesSessionPropertiesNOT_GIVENSession-level configuration (voice, audio config, tools, etc.).
NOT_GIVEN values are omitted, letting the service use its own defaults. Only parameters that are explicitly set are included.

SessionProperties

ParameterTypeDefaultDescription
instructionsstrNoneSystem instructions for the assistant.
voiceLiteral["Ara", "Rex", "Sal", "Eve", "Leo"]"Ara"Voice the model uses to respond.
turn_detectionTurnDetectionTurnDetection(type="server_vad")Turn detection configuration. Set to None for manual turn detection.
audioAudioConfigurationNoneConfiguration for input and output audio formats.
toolsList[GrokTool]NoneAvailable tools: web_search, x_search, file_search, or custom function tools.

AudioConfiguration

The audio field in SessionProperties accepts an AudioConfiguration with input and output sub-configurations: AudioInput (audio.input):
ParameterTypeDefaultDescription
formatAudioFormatNoneInput audio format. Supports PCMAudioFormat (configurable rate), PCMUAudioFormat (8kHz), or PCMAAudioFormat (8kHz).
AudioOutput (audio.output):
ParameterTypeDefaultDescription
formatAudioFormatNoneOutput audio format. Same format options as input.
Grok PCM audio supports sample rates: 8000, 16000, 21050, 24000, 32000, 44100, and 48000 Hz.

Built-in Tools

Grok provides several built-in tools in addition to custom function tools:
ToolTypeDescription
WebSearchToolweb_searchSearch the web for current information
XSearchToolx_searchSearch X (Twitter) for posts. Supports allowed_x_handles filter.
FileSearchToolfile_searchSearch uploaded document collections by vector_store_ids

Usage

Basic Setup

import os
from pipecat.services.grok.realtime import GrokRealtimeLLMService

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
)

With Session Configuration

from pipecat.services.grok.realtime import GrokRealtimeLLMService
from pipecat.services.grok.realtime.events import (
    SessionProperties,
    TurnDetection,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    PCMAudioFormat,
)

session_properties = SessionProperties(
    instructions="You are a helpful assistant.",
    voice="Rex",
    turn_detection=TurnDetection(type="server_vad"),
    audio=AudioConfiguration(
        input=AudioInput(format=PCMAudioFormat(rate=16000)),
        output=AudioOutput(format=PCMAudioFormat(rate=16000)),
    ),
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=session_properties,
    ),
)

With Built-in Tools

from pipecat.services.grok.realtime import GrokRealtimeLLMService
from pipecat.services.grok.realtime.events import (
    SessionProperties,
    WebSearchTool,
    XSearchTool,
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=SessionProperties(
            instructions="You are a helpful assistant with access to web search.",
            voice="Ara",
            tools=[
                WebSearchTool(),
                XSearchTool(allowed_x_handles=["@elonmusk"]),
            ],
        ),
    ),
)

Updating Settings at Runtime

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.grok.realtime.llm import GrokRealtimeLLMSettings
from pipecat.services.grok.realtime.events import SessionProperties

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GrokRealtimeLLMSettings(
            session_properties=SessionProperties(
                instructions="Now speak in Spanish.",
                voice="Eve",
            ),
        )
    )
)
The deprecated session_properties constructor parameter is replaced by Settings as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Audio format auto-configuration: If audio format is not specified in session_properties, the service automatically configures PCM input/output using the pipeline’s sample rates.
  • Server-side VAD: Enabled by default. When VAD is enabled, the server handles speech detection and turn management automatically. Set turn_detection to None to manage turns manually.
  • Audio before setup: Audio is not sent to Grok until the conversation setup is complete, preventing sample rate mismatches.
  • Available voices: Ara (default), Rex, Sal, Eve, and Leo.
  • G.711 support: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.

Event Handlers

EventDescription
on_conversation_item_createdCalled when a new conversation item is created in the session
on_conversation_item_updatedCalled when a conversation item is updated or completed
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")