Skip to main content

Overview

AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.

Installation

To use AWS Nova Sonic services, install the required dependencies:
pip install "pipecat-ai[aws-nova-sonic]"

Prerequisites

AWS Account Setup

Before using AWS Nova Sonic services, you need:
  1. AWS Account: Set up at AWS Console
  2. Bedrock Access: Enable AWS Bedrock service in your region
  3. Model Access: Request access to Nova Sonic models in Bedrock
  4. IAM Credentials: Configure AWS access keys with Bedrock permissions

Required Environment Variables

  • AWS_SECRET_ACCESS_KEY: Your AWS secret access key
  • AWS_ACCESS_KEY_ID: Your AWS access key ID
  • AWS_REGION: AWS region where Bedrock is available

Key Features

  • Real-time Speech-to-Speech: Direct audio input to audio output processing
  • Built-in Transcription: Automatic speech-to-text with real-time streaming
  • Voice Activity Detection: Automatic detection of speech start/stop
  • Function Calling: Support for external function and API integration
  • Multiple Voices: Choose from matthew, tiffany, and amy voice options

Configuration

AWSNovaSonicLLMService

secret_access_key
str
required
AWS secret access key for authentication.
access_key_id
str
required
AWS access key ID for authentication.
session_token
str
default:"None"
AWS session token for temporary credentials (e.g., when using AWS STS).
region
str
required
AWS region where the service is hosted. Supported regions for Nova 2 Sonic (default): "us-east-1", "us-west-2", "ap-northeast-1". Supported regions for Nova Sonic (older model): "us-east-1", "ap-northeast-1".
model
str
default:"amazon.nova-2-sonic-v1:0"
deprecated
Model identifier. Use "amazon.nova-2-sonic-v1:0" for the latest model or "amazon.nova-sonic-v1:0" for the older model.Deprecated in v0.0.105. Use settings=AWSNovaSonicLLMService.Settings(model=...) instead.
voice_id
str
default:"matthew"
deprecated
Voice ID for speech synthesis. Some voices are designed for specific languages. See AWS Nova 2 Sonic voice support for available voices.Deprecated in v0.0.105. Use settings=AWSNovaSonicLLMService.Settings(voice=...) instead.
params
Params
default:"Params()"
deprecated
Model parameters for audio configuration and inference. See Params below.Deprecated in v0.0.105. Use settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.
audio_config
AudioConfig
default:"None"
Audio configuration (sample rates, sample sizes, channel counts). If not provided, defaults are used (16kHz input, 24kHz output, 16-bit, mono). See AudioConfig below.
settings
AWSNovaSonicLLMService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
system_instruction
str
default:"None"
deprecated
System-level instruction for the model.Deprecated in v0.0.105. Use settings=AWSNovaSonicLLMService.Settings(system_instruction=...) instead.
tools
ToolsSchema
default:"None"
Available tools/functions for the model to use.

Settings

Runtime-configurable settings passed via the settings constructor argument using AWSNovaSonicLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNOT_GIVENModel identifier. (Inherited from base settings.)
system_instructionstrNOT_GIVENSystem instruction/prompt. (Inherited from base settings.)
temperaturefloatNOT_GIVENSampling temperature for text generation. (Inherited from base settings.)
max_tokensintNOT_GIVENMaximum number of tokens to generate. (Inherited from base settings.)
top_pfloatNOT_GIVENNucleus sampling parameter. (Inherited from base settings.)
voicestrNOT_GIVENVoice ID for speech synthesis.
endpointing_sensitivitystr | NoneNOT_GIVENControls how quickly Nova Sonic decides the user has stopped speaking. Values: "LOW", "MEDIUM", or "HIGH". Only supported with Nova 2 Sonic (default model).
NOT_GIVEN values are omitted, letting the service use its own defaults (e.g. "amazon.nova-2-sonic-v1:0" for model, "matthew" for voice, 0.7 for temperature, 1024 for max_tokens). Only parameters that are explicitly set are included.

AudioConfig

Audio configuration passed via the audio_config constructor argument.
ParameterTypeDefaultDescription
input_sample_rateint16000Audio input sample rate in Hz.
input_sample_sizeint16Audio input sample size in bits.
input_channel_countint1Number of input audio channels.
output_sample_rateint24000Audio output sample rate in Hz.
output_sample_sizeint16Audio output sample size in bits.
output_channel_countint1Number of output audio channels.

Usage

Basic Setup

import os
from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION"),
    settings=AWSNovaSonicLLMService.Settings(
        voice="matthew",
        system_instruction="You are a helpful assistant.",
    ),
)

With Settings

from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService, AudioConfig

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    audio_config=AudioConfig(
        input_sample_rate=16000,
        output_sample_rate=24000,
    ),
    settings=AWSNovaSonicLLMService.Settings(
        model="amazon.nova-2-sonic-v1:0",
        voice="tiffany",
        system_instruction="You are a helpful assistant.",
        temperature=0.5,
        max_tokens=2048,
        endpointing_sensitivity="MEDIUM",
    ),
)

With Function Calling

from pipecat.services.aws.nova_sonic import AWSNovaSonicLLMService

llm = AWSNovaSonicLLMService(
    secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    settings=AWSNovaSonicLLMService.Settings(
        voice="matthew",
        system_instruction="You are a helpful assistant that can check the weather.",
    ),
    tools=tools,  # ToolsSchema instance
)

@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
    location = args.get("location", "unknown")
    await result_callback({"temperature": 72, "condition": "sunny", "location": location})
The Params / params= pattern is deprecated as of v0.0.105. Use Settings / settings= for inference settings and AudioConfig / audio_config= for audio configuration instead. See the Service Settings guide for migration details.

Notes

  • Model versions: Nova 2 Sonic (amazon.nova-2-sonic-v1:0) is the default and recommended model. The older Nova Sonic (amazon.nova-sonic-v1:0) has fewer features and requires an assistant response trigger mechanism.
  • Endpointing sensitivity: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking — "HIGH" causes the model to respond most quickly.
  • Transcription frames: User speech transcription frames are always emitted upstream.
  • Connection resilience: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
  • System instruction and tools precedence: Instructions and tools provided in the LLM context take precedence over those provided at initialization time.
  • Audio format: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.