Releases: livekit/agents
livekit-plugins-google@0.11.5
Patch Changes
- backporting fix to agents 0.x to ignore Gemini LLM responses with no candidates (#2898) -
73e5384c85ea9b29fa4c946f29c66bef80d5d160
(@davidzhao)
livekit-agents@1.2.0
New Features
Evals & Testing:
You can now perform turn-by-turn evaluations on your agent interactions. Here's an example of how to validate expected behaviors:
result = await sess.run(user_input="Can I book an appointment? What's your availability for the next two weeks?")
result.expect.skip_next_event_if(type="message", role="assistant")
result.expect.next_event().is_function_call(name="list_available_slots")
result.expect.next_event().is_function_call_output()
await result.expect.next_event().is_message(role="assistant").judge(llm, intent="must confirm no availability")
Check out these practical examples: drive-thru, frontdesk
Documentation: https://docs.livekit.io/agents/build/testing/
Preemptive Generation
This feature enables speculative initiation of LLM and TTS processing before the user's turn concludes, significantly reducing response latency by overlapping processing with user audio. Disabled by default:
session = AgentSession(..., preemptive_generation=True)
Enhanced End-of-Turn (EOU) Detection
The end-of-turn model has been refined to reduce sensitivity to punctuation and better handle multilingual scenarios, notably improving Hindi language support.
'
Documentation: https://docs.livekit.io/agents/build/turns/turn-detector/#supported-languages
OpenTelemetry Integration
Agent now supports tracing for LLM/TTS requests and user callbacks using OpenTelemetry. See LangFuse example for detailed implementation.
Experimental Agent Tasks
AgentTask is a new experimental subset feature allowing agents to terminate upon achieving specific goals. You can await AgentTasks directly in your workflows:
@function_tool
async def schedule_appointment(self, ctx: RunContext[Userdata], slot_id: str) -> str:
# Attempts to retrieve user email, allowing multiple agent-user interactions
email_result = await beta.workflows.GetEmailTask(chat_ctx=self.chat_ctx)
Half-Duplex Pipeline
Combine Gemini or OpenAI's realtime STT/LLM with a separate TTS engine, optimizing your agent's voice interactions:
session = AgentSession(
llm=openai.realtime.RealtimeModel(modalities=["text"]),
# Alternatively: llm=google.beta.realtime.RealtimeModel(modalities=[Modality.TEXT]),
tts=openai.TTS(voice="ash"),
)
View the complete example.
Documentation: https://docs.livekit.io/agents/integrations/realtime/#separate-tts
Improved Transcription Synchronization
Align transcripts accurately with speech outputs from TTS engines such as Cartesia and 11labs for improved synchronization:
session = AgentSession(..., use_tts_aligned_transcript=True)
Refer to the complete example.
Documentation: https://docs.livekit.io/agents/build/text/#tts-aligned-transcriptions
Upgraded Tokenization Engine
Transitioned to the Blingfire tokenization engine from the previous naive implementation, significantly enhancing handling and accuracy for multiple languages.
Complete changelog
- introduce AgentTask by @theomonnom in #2483
- introduce workflows & GetEmailAgent by @theomonnom in #2498
- drive-thru example by @theomonnom in #2609
- reuse SpeechHandle for all generations inside a single turn by @theomonnom in #2623
- introduce test & eval primitives by @theomonnom in #2662
- evals: add maybe_* utils by @theomonnom in #2681
- evals: better error message for assertions by @theomonnom in #2682
- evals: RunResult final_output on Agent tasks by @theomonnom in #2696
- evals: AgentTask GetEmailAdress tests e.g by @theomonnom in #2697
- allow optional RunResult output_type by @theomonnom in #2698
- evals: add EventRangeAssert utils by @theomonnom in #2699
- add front-desk agent example by @theomonnom in #2724
- fix InlineAgent agent resume on error by @theomonnom in #2730
- add ChatContext.merge & merge inline tasks chat_ctx by @theomonnom in #2731
- better GetEmailAgent instructions by @theomonnom in #2732
- exclude function_call inside ChatContext.merge by @theomonnom in #2733
- add Blingfire tokenizer & use it by default by @theomonnom in #2771
- fix RealtimeModel generate_reply authorization by @theomonnom in #2773
- support timed transcripts from tts by @longcw in #2580
- ignore empty sentence in tts stream adapter by @longcw in #2777
- fix types for agents 1.2 by @longcw in #2778
- fix MockTools type by @longcw in #2781
- fix RunResult order of fnc_call & agent_handoff by @theomonnom in #2782
- fix types by @theomonnom in #2783
- fix tr_input by @theomonnom in #2784
- fix GetEmailAgent instructions by @theomonnom in #2786
- fix blingfire tokenizer test by @longcw in #2785
- support tts with realtime model (audio in, text out) by @longcw in #2628
- fix assistant message order on the RunResult by @theomonnom in #2787
- fix FrontDeskAgent list_available_slots by @theomonnom in #2788
- initial evals for the FrontDesk agent by @theomonnom in #2790
- ignore empty assistant messages by @theomonnom in #2792
- evals: add CI by @theomonnom in #2791
- evals ci: use python 3.12 by @theomonnom in #2793
- fix confirmation/validation ambiguity on GetEmailAgent instructions by @theomonnom in #2794
- punctuation free turn detector by @jeradf in #2717
- frontdesk: ToolError example by @theomonnom in #2808
- evals API improvements by @theomonnom in #2846
- make arguments optional for mock_tools by @theomonnom in #2847
- allow returning Exception inside function tools by @theomonnom in #2848
- add envvar to enable verbose evals logs by @theomonnom in #2849
- preemptive generation before end of user turn by @longcw in #2728
- fix next_event return type by @theomonnom in #2856
- evals: add docstrings to the public API by @theomonnom in #2857
- only print the judge result when verbose is enabled by @theomonnom in #2858
- Add contains_agent_handoff assertion by @bcherry in #2862
- allow editing SpeechHandle allow_interruptions & add RunContext.disallow_interruptions by @theomonnom in #2864
- fix evals test by @theomonnom in #2865
- fix ruff and types by @longcw in #2889
- add opentelemetry trace by @longcw in #2873
- fix unordered user messages by @theomonnom in #2891
- fix livekit-agents 1.2 tests by @theomonnom in #2866
- cleanup & prepare for release by @theomonnom in #2893
- add prometheus by @theomonnom in #2908
- add gen_ai attributes to llm_request by @longcw in #2905
- fix types and aws realtime model by @longcw in #2910
- fix TTS fallback adapter metrics_collected event by @longcw in #2890
- add model property for llm plugins by @longcw in #2914
- nit: mprove drivethru by @theomonnom in #2918
- Removing ctx.connect() from examples by @sascotto in #2909
- expose tokenizer option for cartesia tts by @longcw in #2916
- remove openai prewarm by @theomonnom in #2919
- add tts_audio_duration to usage metrics collection by @Panmax in #2915
...
livekit-agents@1.1.7
What's Changed
- fix log extra field handling in log.py by @Panmax in #2875
- fix aws realtime model types by @longcw in #2877
- chore: export PlayHandle type by @davidzhao in #2903
- fix gemini realtime user transcription sent twice by @longcw in #2899
- append framework ID to User-Agent Header by @BumaldaOverTheWater94 in #2896
- add gemini tts (beta) by @longcw in #2834
- fix DatastreamIO cancellation race by @theomonnom in #2911
- DataStreamIO wait for start when capturing_frame by @theomonnom in #2912
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.6...livekit-agents@1.1.7
livekit-agents@1.1.6
What's Changed
- Fix LMNT plugin docs by @zachoverflow in #2762
- Update new plugin readmes for format and links by @bcherry in #2571
- fix update_chat_ctx bug by @BumaldaOverTheWater94 in #2763
- Include item id when converting to LG messages by @dkeller-sondermind in #2767
- fix schedule speech on windows when monotonic_ns resolution is rough by @longcw in #2770
- Install optional dependencies during docs gen by @bcherry in #2766
- Feat/mistralai plugins by @fabitokki in #2772
- fix docker-compose typo by @theomonnom in #2789
- suppress main_stream ended error in stt fallback adapter by @longcw in #2684
- [fix] Fixed Orus voice name definition by @Is44m in #2797
- fix aws sonic type checking by @longcw in #2804
- fix deepgram stt docs by @longcw in #2803
- Hotfix for Baseten STT by @htrivedi99 in #2801
- fix inactive user instructions by @theomonnom in #2809
- fix BackgroundAudio hanging on close error by @theomonnom in #2814
- reset closing_ws for openai stt by @longcw in #2813
- avoid sid error in console mode by @theomonnom in #2815
- ignore livekit api when using console mode by @theomonnom in #2816
- Feature : Add audio_mixer_kwargs to BackgroundAudioPlayer by @CyprienRicqueB2L in #2796
- fix FunctionToolsExecutedEvent import by @longcw in #2832
- feat: ability to use remote EOT inference when deployed in Cloud by @davidzhao in #2780
- Add support for CustomPronunciations in Google TTS plugin by @kechako in #2692
- Nova Sonic Example Agent by @BumaldaOverTheWater94 in #2817
- Prevent console mode from crashing by @donalffons in #2853
- Small fix to README by @kath0la in #2861
- Fix: Use synchronized transcript for interrupted session.say() responses by @eliotsamuelmiller in #2843
- fix aws sonic tools by @theomonnom in #2859
- log metrics in extra by @theomonnom in #2868
- accidentally omit a docstring by @BumaldaOverTheWater94 in #2869
New Contributors
- @zachoverflow made their first contribution in #2762
- @dkeller-sondermind made their first contribution in #2767
- @fabitokki made their first contribution in #2772
- @Is44m made their first contribution in #2797
- @donalffons made their first contribution in #2853
- @eliotsamuelmiller made their first contribution in #2843
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.5...livekit-agents@1.1.6
livekit-agents@1.1.5
What's Changed
- Preserve original path when connecting to web socket (fix for #2700) by @arpesenti in #2702
- disconnect room when session closed due to participant disconnected by @longcw in #2712
- make sure audio_output.flush called when capture frame failed by @longcw in #2718
- Update Inworld README by @ShayneP in #2723
- Updating whisper API by @htrivedi99 in #2726
- Lock google-genai package to stable v1.20.0 by @simplegr33n in #2725
- fix(google): pass in raw schema according to genai 1.20 spec by @davidzhao in #2727
- feat(google): expose seed parameter in LLM.chat by @mrkowalski in #2721
- upgrade google genai to 1.23 by @longcw in #2743
- support 11labs auto mode with sentence tokenizer by @longcw in #2744
- add livekit-blingfire by @theomonnom in #2734
- remove changesets by @theomonnom in #2749
- uv: ignore blingfire by @theomonnom in #2750
- fix aggregate-dumps when no file is present by @theomonnom in #2751
- run tts tests on top10 providers by @theomonnom in #2752
- delete changesets x2 by @theomonnom in #2753
- add build CI by @theomonnom in #2754
- fix blingfire build CI by @theomonnom in #2756
- BlingFire: use Release config on Windows by @theomonnom in #2757
- build blingfire for macos x86 & linux arm64 by @theomonnom in #2758
- Nova Sonic Realtime Plugin by @BumaldaOverTheWater94 in #2740
- keep aws nova sonic optional by @theomonnom in #2760
New Contributors
- @arpesenti made their first contribution in #2702
- @mrkowalski made their first contribution in #2721
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.4...livekit-agents@1.1.5
livekit-agents@1.1.4
What's Changed
- add --ignore-changesets to update_versions.py by @theomonnom in #2665
- remove frame_size_ms param when creating AudioStream by @longcw in #2667
- Gladia STT - add new parameters to gladia stt by @mfernandez-gladia in #2649
- expose automatic_function_calling config for google LLM by @longcw in #2675
- start user away timer after user join by @longcw in #2676
- preserve created_at timestamp when updating instructions by @Panmax in #2677
- use parameters_json_schema for raw function tool with google LLM by @longcw in #2686
- import TextInputEvent from room_io by @longcw in #2679
- reset agent and user state after session closed by @longcw in #2691
- Add hedra extra by @bcherry in #2705
- add markdown filter for tts and transcription nodes by @longcw in #2695
- Fix Example Typo by @toubatbrian in #2706
- Inworld TTS by @davidzhao in #2693
- add warning for deprecated speed and emotion control for cartesia tts by @longcw in #2708
- fix(plugins-inworld): change default voice to Ashley by @MichaelSolati in #2707
- deepgram: disable smart_format by default by @theomonnom in #2704
- livekit-agents 1.1.4 by @theomonnom in #2709
New Contributors
- @mfernandez-gladia made their first contribution in #2649
- @Panmax made their first contribution in #2677
- @MichaelSolati made their first contribution in #2707
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.2...livekit-agents@1.1.4
livekit-agents@1.1.2
What's Changed
- Add spitch optional dependency by @temibabs in #2559
- add Cartesia STT usage event by @ChenghaoMou in #2565
- use the cgroup cpu_count for the inference thread pool by @theomonnom in #2572
- avoid possible contention on concurrent inference executions by @theomonnom in #2575
- use onnx dynamic_block_base by @theomonnom in #2578
- add vad for stt FallbackAdapter by @longcw in #2582
- Don't require sarvam api key param for TTS by @bcherry in #2579
- Remove unnecessary model param from baseten tts by @bcherry in #2568
- Fix baseten STT api key lookup by @bcherry in #2576
- fix stt fallback adapter imports by @longcw in #2590
- Replace the office-ambience sound file by @bcherry in #2588
- chore(deepgram,cartesia): removed AudioEnergyFilter by @davidzhao in #2594
- unit tests for agent session by @longcw in #2518
- fix unknown energy filter parameter by @theomonnom in #2599
- fix type check by @longcw in #2596
- wait for final transcript in manual turn detection by @longcw in #2597
- add volume gain option by @jmugicagonz in #2603
- increase audio frame size by @theomonnom in #2610
- Add SSML support for Google TTS by @kechako in #2608
- fix OpenAI Realtime connect timeout by @theomonnom in #2612
- fix OpenAI Realtime tool_choice by @theomonnom in #2613
- add transcript_confidence to ChatMessage by @theomonnom in #2611
- fix(turn-detector): improve accuracy by combining adjacent turns by @davidzhao in #2595
- fix transcription delay when VAD false negative by @longcw in #2620
- Hume plugin fixes by @zgreathouse in #2591
- Updating metrics for cached tokens for Realtime model (OpenAI) by @tg-bomze in #2621
- Disable ensure_ascii by @tg-bomze in #2622
- add timeout for agent session tests by @longcw in #2624
- add error log when llm fallback adapter failed because chunk_sent by @longcw in #2626
- fix ChatContext.insert type check by @theomonnom in #2635
- Removes the split_utterances option from Hume TTS plugin by @zgreathouse in #2638
- wait for video track from avatar plugins by @longcw in #2627
- add http_options for gemini LLM and realtime model by @longcw in #2640
- correctly passing speaking_rate to StreamingAudioConfig by @david-rodriguez in #2631
- Fix : Increase audio mixer timeout by @CyprienRicqueB2L in #2646
- handling multiple audio chunk output by @raghavjaistra in #2641
- Fix Hume TTS by @bcherry in #2639
- Update sarvam defaults, add 2.5 by @bcherry in #2618
- fix tracing param in openai realtime by @longcw in #2652
- raise error from gladia stt for fallback adapter and retry by @longcw in #2653
- fix await tasks groups never return by @longcw in #2654
- chore: add note for job_context.api usage by @davidzhao in #2655
- fix(google): update dependency versions by @davidzhao in #2658
- feat(baseten): add LLM module by @davidzhao in #2657
- cleanup tee in agent activity by @longcw in #2660
- fix duplicated audio on flush by @theomonnom in #2663
- fix transcription sync warning when gemini no text output by @longcw in #2661
- livekit-agents v1.1.2 by @theomonnom in #2664
New Contributors
- @tg-bomze made their first contribution in #2621
- @david-rodriguez made their first contribution in #2631
- @CyprienRicqueB2L made their first contribution in #2646
- @raghavjaistra made their first contribution in #2641
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.0...livekit-agents@1.1.2
livekit-agents@1.1.0
What's Changed
- TTS improvements & tests by @theomonnom in #2152
- rewrite Azure TTS by @theomonnom in #2151
- fix google TTS by @theomonnom in #2410
- add to_provider_format for ChatContext by @longcw in #2295
- automatically close agent session when participant disconnected by @longcw in #2398
- fix type checks and tts fallback adapter by @longcw in #2419
- deprecate multi-segments SynthesizeStream by @theomonnom in #2421
- cartesia: fix break by @theomonnom in #2422
- avoid raising tts empty errors when pushed text is empty by @longcw in #2420
- don't error when pushing on closed stream by @theomonnom in #2424
- Add diarization support by @ShayneP in #2338
- fix gemini user transcription when tool calls by @longcw in #2439
- skip response if no llm set in user turn completed by @longcw in #2441
- fix type checks for plugins by @longcw in #2423
- Fix SpeechHandle Priority Schedule by @toubatbrian in #2433
- use time.monotonic_ns for speech scheduling by @theomonnom in #2446
- fix transcription sync when on_playback_finished missing after flush by @longcw in #2397
- LMNT agent plugin for TTS synthesis by @naiveen in #2413
- fix agent state for pipeline agent by @longcw in #2453
- add max_session_duration and auto reconnection for OAI realtime api by @longcw in #2360
- avatar publish video after waiting participant by @longcw in #2450
- PlayAI plugin: fix language tag by @bryananderson in #2458
- Update README.md by @theomonnom in #2466
- fixed identifying streamable http mcp servers containing api key in url by @Akshay-a in #2468
- fix(google): Live syncs context, supports manual turns by @davidzhao in #2401
- AssemblyAI Remove Hardcoded Default Configuration by @dan-ince-aai in #2456
- add duration_per_frame for datastream audio receiver by @longcw in #2474
- add logs after session closed by @longcw in #2479
- rename to frame_size_ms for data stream audio receiver by @longcw in #2481
- chore(assemblyai): renaming to format_turns and only emit formatted f… by @dan-ince-aai in #2485
- fix optional args in Annotated argument by @longcw in #2491
- fix text only example by @longcw in #2490
- add artificial delay between consecutive speech handles by @longcw in #2492
- Support for Spitch in LiveKit by @temibabs in #2430
- detect inactive user example by @theomonnom in #2499
- recover from incorrect LLM arguments in function_tool by @theomonnom in #2500
- add max_unrecoverable_errors and connection options for agent session by @longcw in #2494
- Collect prompt cached tokens count in llm usage in AWS LLM plugin by @alfredguiaugment in #2508
- fix tts fallback adapter test and stream adapter by @longcw in #2514
- add hedra plugin by @longcw in #2163
- Fix broken link in silero readme by @bcherry in #2521
- fix(google): proactivity and affective_dialog require v1alpha1 API by @davidzhao in #2523
- fix: LLM to honor custom timeouts by @davidzhao in #2526
- ignore empty assistant messages by @theomonnom in #2530
- feat(openai): strip thinking tokens by @davidzhao in #2524
- Fix typo in cerebras error msg by @bcherry in #2531
- feat: surface tavus conversation id by @mertgerdan in #2532
- feat: langgraph integration by @davidzhao in #2534
- cleanup bithuman when process shutdown by @longcw in #2536
- add eleven labs v3 model by @choso in #2540
- lmnt: Update default voice, add temperature, topp options by @naiveen in #2539
- Add Cartesia STT integration by @DineshTeja in #2538
- Baseten Livekit plugin integration by @htrivedi99 in #2520
- feat: Sarvam.ai plugin for STT and TTS by @AnshTanwar in #2241
- chore: tweaks to plugins CI by @davidzhao in #2543
- use not given for room io options by @longcw in #2542
- Add new speed and tracing options to OpenAI RealtimeModel and RealtimeSession by @mikevin920 in #2503
- add connect options and error retry for realtime model by @longcw in #2544
- initial prewarm by @theomonnom in #2527
- bithuman avatar refresh token after prewarm by @longcw in #2541
- ignore prewarm failures by @theomonnom in #2545
- run room_io.start and ctx.connect concurrently in session.start by @longcw in #2505
- convert TracingOptions for session updates by @theomonnom in #2546
- use pyht SDK by @theomonnom in #2459
- update x.ai models by @theomonnom in #2547
- cancel tasks on start failure by @theomonnom in #2548
- livekit-agents v1.1.0 by @theomonnom in #2549
New Contributors
- @dan-ince-aai made their first contribution in #2399
- @toubatbrian made their first contribution in #2433
- @naiveen made their first contribution in #2413
- @temibabs made their first contribution in #2430
- @alfredguiaugment made their first contribution in #2508
- @mertgerdan made their first contribution in #2532
- @choso made their first contribution in #2540
- @DineshTeja made their first contribution in #2538
- @htrivedi99 made their first contribution in #2520
- @AnshTanwar made their first contribution in #2241
- @mikevin920 made their first contribution in #2503
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.0.23...livekit-agents@1.1.0
livekit-agents@1.0.23
What's Changed
- Support empty array of parameters when using raw_schema by @koen-boost in #2328
- fix: avoid shadowing name in function_tool decorator by @davidzhao in #2331
- feat: Add
with_letta
OpenAI plugin by @mattzh72 in #2182 - fix tool choice by @jayeshp19 in #2332
- Expose all realtime model parameters by @Shubhrakanti in #2324
- add prewarm to bithuman avatar example by @longcw in #2337
- fix agent transcription truncate for console mode by @longcw in #2327
- fix chat context item order by @longcw in #2321
- google: add new models to LLM and live by @davidzhao in #2344
- handle missing token_count in realtime usage metrics by @fredvollmer in #2350
- [Rime] Increase timeout for
arcana
model to allow for synthesis of long audio by @MaCaki in #2343 - google: do not error when empty responses are returned by @davidzhao in #2345
- fix type check by @longcw in #2335
- add internal worker token by @real-danm in #2354
- Add input_audio_noise_reduction to OpenAI RealtimeModel by @RBT22 in #2362
- Setting openai temperature on
LLM.chat
by @free-soellingeraj in #2353 - on_end_of_turn is sync by @theomonnom in #2374
- multilingual model update by @jeradf in #2219
- ignore any_generics for mypy by @longcw in #2375
- rename insert_item to insert by @theomonnom in #2372
- support stt END_OF_SPEECH for stt turn detection by @longcw in #2363
- Implemented #2379 - Add support for more paramaters for Google Live API by @F1nnM in #2380
- disable split characters for tts by @longcw in #2366
- google: fix proactive audio config, update genai by @davidzhao in #2390
- fix race condition in avatar runner when reset playback_position by @longcw in #2396
- set user state to away after a timeout by @longcw in #2408
- add MCP support for streamable HTTP client by @Akshay-a in #2394
- Upgrade AssemblyAI to Universal-Streaming by @dan-ince-aai in #2399
- fix AssemblyAI & follow docs by @theomonnom in #2445
New Contributors
- @koen-boost made their first contribution in #2328
- @mattzh72 made their first contribution in #2182
- @fredvollmer made their first contribution in #2350
- @real-danm made their first contribution in #2354
- @RBT22 made their first contribution in #2362
- @free-soellingeraj made their first contribution in #2353
- @F1nnM made their first contribution in #2380
- @Akshay-a made their first contribution in #2394
- @dan-ince-aai made their first contribution in #2399
Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.0.22...livekit-agents@1.0.23
livekit-plugins-turn-detector@0.4.5
Patch Changes
- update to livekit python 1.0 -
32e129ff1a4c3d28f363f4f2b2a355e29c8fe64d
(@davidzhao)