Chat with a voice assistant built with LiveKit and the Gemini Live API
Overview
Google's Gemini Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. LiveKit's Google plugin includes a RealtimeModel
class that allows you to use this API to create agents with natural, human-like voice conversations.
Quick reference
This section includes a basic usage example and some reference material. For links to more detailed documentation, see Additional resources.
Installation
Install the Google plugin from PyPI:
pip install "livekit-agents[google]~=1.0"
Authentication
The Google plugin requires authentication based on your chosen service:
- For Vertex AI, you must set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of the service account key file. - For Google Gemini API, set the
GOOGLE_API_KEY
environment variable.
Usage
Use the Gemini Live API within an AgentSession
. For example, you can use it in the Voice AI quickstart.
from livekit.plugins import googlesession = AgentSession(llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp",voice="Puck",temperature=0.8,instructions="You are a helpful assistant",),)
Parameters
This section describes some of the available parameters. For a complete reference of all available parameters, see the plugin reference.
System instructions to better control the model's output and specify tone and sentiment of responses. To learn more, see System instructions.
Live API model to use.
Google Gemini API key.
Name of the Gemini Live API voice. For a full list, see Voices.
List of response modalities to use, such as ["TEXT", "AUDIO"]
. Set to ["TEXT"]
to use the model in text-only mode with a separate TTS plugin.
If set to true, use Vertex AI.
Google Cloud project ID to use for the API (if vertextai=True
). By default, it uses the project in the service account key file (set using the GOOGLE_APPLICATION_CREDENTIALS
environment variable).
Google Cloud location to use for the API (if vertextai=True
). By default, it uses the location from the service account key file or us-central1
.
List of built-in Google tools, such as Google Search. For more information, see Gemini tools.
Gemini tools
This integration is experimental and may change in a future SDK release.
The _gemini_tools
parameter allows you to use built-in Google tools with the Gemini model. For example, you can use this feature to implement Grounding with Google Search:
from google.genai import typessession = AgentSession(llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp",_gemini_tools=[types.GoogleSearch()],))
Turn detection
The Gemini Live API includes built-in VAD-based turn detection, which is currently the only supported turn detection method.
Usage with separate TTS
To use the Gemini Live API with a different TTS provider, configure it with a text-only response modality and include a TTS plugin in your AgentSession
configuration. This configuration allows you to gain the benefits of realtime speech comprehension while maintaining complete control over the speech output.
from google.genai.types import Modalitysession = AgentSession(llm=google.beta.realtime.RealtimeModel(modalities=[Modality.TEXT]),tts=cartesia.TTS(),)
Additional resources
The following resources provide more information about using Gemini with LiveKit Agents.
Python package
The livekit-plugins-google
package on PyPI.
Plugin reference
Reference for the Gemini Live API plugin.
GitHub repo
View the source or contribute to the LiveKit Google plugin.
Gemini docs
Gemini Live API documentation.
Voice AI quickstart
Get started with LiveKit Agents and Gemini Live API.
Google AI ecosystem guide
Overview of the entire Google AI and LiveKit Agents integration.