这是indexloc提供的服务,不要输入任何密码
Skip to content

Is update_chat_ctx supported for AWS Realtime? #2977

@bbodien

Description

@bbodien

I've migrated an application to LiveKit that previously interacted directly with AWS NovaSonic's SDK.

One of my features is conversation resuming, where my client Next.js app fetches previous text messages from my database and renders them, and if the user clicks a resume button I fetch a LiveKit token and attempt to pass an array of those past messages to my LiveKit agent to use as context, in addition to using the system prompt as normal.

My full agent code is below.

from dotenv import load_dotenv
import json
import logging

from livekit import agents, rtc
from livekit.agents import (
  Agent,
  AgentSession,
  RoomInputOptions,
  RoomOutputOptions,
  ChatContext,
  llm)
from livekit.plugins import aws
from livekit.plugins import ( noise_cancellation )

from constants import (SYSTEM_PROMPT)

load_dotenv()

logger = logging.getLogger("kyorla-agent")
logger.setLevel(logging.INFO)

class Kyorla(Agent):
  def __init__(self) -> None:
    super().__init__(instructions=SYSTEM_PROMPT)

async def entrypoint(ctx: agents.JobContext):

  logger.info(f"Connecting to room {ctx.room.name}")
  await ctx.connect()

  participant = await ctx.wait_for_participant()
  logger.info(f"Starting voice assistant for participant {participant.identity}")

  history = json.loads(participant.metadata).get("chat_history", [])
  chat_ctx = ChatContext.empty()
  for msg in history:
    # logger.info(f"ADDING MESSAGE: {msg['role']}: {msg['content']}")
    chat_ctx.add_message(role=msg["role"], content=msg["content"])

  session = AgentSession(
    llm=aws.realtime.RealtimeModel(
      voice="tiffany",
      region="us-east-1",
      temperature=0.7,
      top_p=0.95,
    ),
  )

  agent = Kyorla()

  await session.start(
    room=ctx.room,
    agent=agent,
    room_input_options=RoomInputOptions(
      noise_cancellation=noise_cancellation.BVC(),
    ),
    room_output_options=RoomOutputOptions(
      sync_transcription=True,
    ),
  )

  # logger.info(f"INITIAL CONTEXT: {chat_ctx.items}")
  await agent.update_chat_ctx(chat_ctx)

if __name__ == "__main__":
  agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

When I uncomment the logger lines I can see the message roles and content being received and parsed correctly, so I'm thinking that the issue might be with a lack of support with LiveKit's implementation of the AWS Realtime model and update_chat_ctx().

I don't see any errors in the agent logs, but equally if I have a topical conversation with the agent, stop and then resume the conversation and ask it what we were talking about, it only gives generic responses.

I've tried reordering operations so that I do the update_chat_ctx() before session start too, but this seemingly makes no difference.

If there's a different function or LiveKit pattern I should be using, it'd be great to learn!

Previously, I was doing this with NovaSonic using a pattern from the AWS example repo, which does a similar thing of iterating over an array of "role" and "content" messages, sending these to the model. Each is sent as a textInput event, preceded and succeeded by contentStart and contentEnd, which can be seen here.

This approach was definitely working as I could test it with conversations such as:

user: "I'll go and find out about X and talk to you later."
agent: "Ok."

... disconnection + resumption ...

user: "I'm back."
agent: "Great, what did you find out about X?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions