-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Sometimes after user is done speaking, the llm node returns nothing, and then the STT node is waiting for user to speak again. Using the basic agent example.
2025-07-17 15:21:43,188 - INFO livekit.agents - STT metrics: audio_duration=5.00 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:48,203 - INFO livekit.agents - STT metrics: audio_duration=5.05 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:50,019 - DEBUG livekit.agents - received user transcript {"room": "playground-Nqt8-zYVy", "user_transcript": "¿Me puedes decir el clima de Japón?", "language": "es", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:50,198 - DEBUG livekit.plugins.turn_detector - eou prediction {"room": "playground-Nqt8-zYVy", "eou_probability": 0.8865267038345337, "input": "<|im_start|>assistant\nHey there! What can I help you with today?<|im_end|>\n<|im_start|>user\nHola, ¿me puedes escuchar<|im_end|>\n<|im_start|>assistant\n¡Hola! Sí, te escucho. ¿En qué puedo ayudarte hoy?<|im_end|>\n<|im_start|>user\n¿Me puedes decir el clima de Japón?", "duration": 0.088, "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:50,198 - INFO livekit.agents - EOU metrics: end_of_utterance_delay=0.67, transcription_delay=0.49 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:50,875 - INFO livekit.agents - LLM metrics: ttft=-1.00, input_tokens=0, cached_input_tokens=0, output_tokens=0, tokens_per_second=0.00 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:53,225 - INFO livekit.agents - STT metrics: audio_duration=5.00 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:21:58,283 - INFO livekit.agents - STT metrics: audio_duration=5.05 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:22:03,305 - INFO livekit.agents - STT metrics: audio_duration=5.05 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:22:08,333 - INFO livekit.agents - STT metrics: audio_duration=5.00 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:22:13,354 - INFO livekit.agents - STT metrics: audio_duration=5.05 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
2025-07-17 15:23:48,929 - INFO livekit.agents - STT metrics: audio_duration=5.00 {"room": "playground-Nqt8-zYVy", "pid": 185, "job_id": "simulated-job-c4aa6d196dc2"}
import logging
from dotenv import load_dotenv
from livekit.agents import (
Agent,
AgentSession,
JobContext,
JobProcess,
RoomInputOptions,
RoomOutputOptions,
RunContext,
WorkerOptions,
cli,
metrics,
)
from livekit.agents.llm import function_tool
from livekit.agents.voice import MetricsCollectedEvent
# uncomment to enable Krisp background voice/noise cancellation
from livekit.plugins import deepgram, noise_cancellation, openai, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
logger = logging.getLogger("basic-agent")
load_dotenv()
class MyAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions="Your name is Kelly. You would interact with users via voice."
"with that in mind keep your responses concise and to the point."
"You are curious and friendly, and have a sense of humor.",
)
async def on_enter(self):
# when the agent is added to the session, it'll generate a reply
# according to its instructions
self.session.generate_reply()
# all functions annotated with @function_tool will be passed to the LLM when this
# agent is active
@function_tool
async def lookup_weather(self, context: RunContext, location: str, latitude: str, longitude: str):
"""Called when the user asks for weather related information.
Ensure the user's location (city or region) is provided.
When given a location, please estimate the latitude and longitude of the location and
do not ask the user for them.
Args:
location: The location they are asking for
latitude: The latitude of the location, do not ask user for it
longitude: The longitude of the location, do not ask user for it
"""
logger.info(f"Looking up weather for {location}")
return "sunny with a temperature of 70 degrees."
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
# each log entry will include these fields
ctx.log_context_fields = {
"room": ctx.room.name,
}
session = AgentSession(
vad=ctx.proc.userdata["vad"],
# any combination of STT, LLM, TTS, or realtime API can be used
llm=openai.LLM(model="gpt-4o-mini"),
stt=deepgram.STT(model="nova-2", language="es", detect_language=False),
tts=openai.TTS(voice="ash"),
# use LiveKit's turn detection model
turn_detection=MultilingualModel(),
)
# log metrics as they are emitted, and total usage after session is over
usage_collector = metrics.UsageCollector()
@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
usage_collector.collect(ev.metrics)
async def log_usage():
summary = usage_collector.get_summary()
logger.info(f"Usage: {summary}")
# shutdown callbacks are triggered when the session is over
ctx.add_shutdown_callback(log_usage)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_input_options=RoomInputOptions(
# uncomment to enable Krisp BVC noise cancellation
noise_cancellation=noise_cancellation.BVC(),
),
room_output_options=RoomOutputOptions(transcription_enabled=True),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working