-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Language models use JSON Schema, MCP uses JSON Schema, OpenAPI uses JSON Schema, but livekit uses python functions. This creates a mismatch between the way tools actually interact with LLMs and the livekit API that makes it challenging to support things like MCP, OpenAPI, etc. without mapping the schema to a python function and back again as is done in the MCP sample here:
https://github.com/livekit-examples/basic-mcp
While this sample attempts to get the job done, it's full of all sorts of complicated code that shouldn't need to be written in the first place. It only exists because of the constraint that tools must be python functions, which is not an LLM native requirement, it's a LiveKit imposed requirement.
With the release of the Responses API, OpenAI has also added an alternative tool calling model for built in tools, and it's not clear how to plug those tools into a Livekit voice agent due to the way tool calling is implemented.
In a perfect world, the tool calling process would be something overridable / customizable by the llm adapter itself or a tool calling adapter, so someone could create an adapter that better integrates with native LLM capabilities or standards like MCP, instead of having to manage a lossy conversion to python functions and back again. At the moment, tool calling is implemented in a central set of functions that the llm adapter is not even involved in, making it very hard to customize tool calling to use functionality LLMs already support without forking the entire SDK:
exe_task, tool_output = perform_tool_executions( |