Releases · livekit/agents

Patch Changes

backporting fix to agents 0.x to ignore Gemini LLM responses with no candidates (#2898) - 73e5384c85ea9b29fa4c946f29c66bef80d5d160 (@davidzhao)

@theomonnom

New Features

Evals & Testing:

You can now perform turn-by-turn evaluations on your agent interactions. Here's an example of how to validate expected behaviors:

result = await sess.run(user_input="Can I book an appointment? What's your availability for the next two weeks?")
result.expect.skip_next_event_if(type="message", role="assistant")
result.expect.next_event().is_function_call(name="list_available_slots")
result.expect.next_event().is_function_call_output()
await result.expect.next_event().is_message(role="assistant").judge(llm, intent="must confirm no availability")

Check out these practical examples: drive-thru, frontdesk

Documentation: https://docs.livekit.io/agents/build/testing/

Preemptive Generation

This feature enables speculative initiation of LLM and TTS processing before the user's turn concludes, significantly reducing response latency by overlapping processing with user audio. Disabled by default:

session = AgentSession(..., preemptive_generation=True)

Enhanced End-of-Turn (EOU) Detection

The end-of-turn model has been refined to reduce sensitivity to punctuation and better handle multilingual scenarios, notably improving Hindi language support.
'
Documentation: https://docs.livekit.io/agents/build/turns/turn-detector/#supported-languages

OpenTelemetry Integration

Agent now supports tracing for LLM/TTS requests and user callbacks using OpenTelemetry. See LangFuse example for detailed implementation.

Experimental Agent Tasks

AgentTask is a new experimental subset feature allowing agents to terminate upon achieving specific goals. You can await AgentTasks directly in your workflows:

@function_tool
async def schedule_appointment(self, ctx: RunContext[Userdata], slot_id: str) -> str:
    # Attempts to retrieve user email, allowing multiple agent-user interactions
    email_result = await beta.workflows.GetEmailTask(chat_ctx=self.chat_ctx)

Half-Duplex Pipeline

Combine Gemini or OpenAI's realtime STT/LLM with a separate TTS engine, optimizing your agent's voice interactions:

session = AgentSession(
    llm=openai.realtime.RealtimeModel(modalities=["text"]),
    # Alternatively: llm=google.beta.realtime.RealtimeModel(modalities=[Modality.TEXT]),
    tts=openai.TTS(voice="ash"),
)

View the complete example.

Documentation: https://docs.livekit.io/agents/integrations/realtime/#separate-tts

Improved Transcription Synchronization

Align transcripts accurately with speech outputs from TTS engines such as Cartesia and 11labs for improved synchronization:

session = AgentSession(..., use_tts_aligned_transcript=True)

Refer to the complete example.

Documentation: https://docs.livekit.io/agents/build/text/#tts-aligned-transcriptions

Upgraded Tokenization Engine

Transitioned to the Blingfire tokenization engine from the previous naive implementation, significantly enhancing handling and accuracy for multiple languages.

Complete changelog

introduce AgentTask by @theomonnom in #2483
introduce workflows & GetEmailAgent by @theomonnom in #2498
drive-thru example by @theomonnom in #2609
reuse SpeechHandle for all generations inside a single turn by @theomonnom in #2623
introduce test & eval primitives by @theomonnom in #2662
evals: add maybe_* utils by @theomonnom in #2681
evals: better error message for assertions by @theomonnom in #2682
evals: RunResult final_output on Agent tasks by @theomonnom in #2696
evals: AgentTask GetEmailAdress tests e.g by @theomonnom in #2697
allow optional RunResult output_type by @theomonnom in #2698
evals: add EventRangeAssert utils by @theomonnom in #2699
add front-desk agent example by @theomonnom in #2724
fix InlineAgent agent resume on error by @theomonnom in #2730
add ChatContext.merge & merge inline tasks chat_ctx by @theomonnom in #2731
better GetEmailAgent instructions by @theomonnom in #2732
exclude function_call inside ChatContext.merge by @theomonnom in #2733
add Blingfire tokenizer & use it by default by @theomonnom in #2771
fix RealtimeModel generate_reply authorization by @theomonnom in #2773
support timed transcripts from tts by @longcw in #2580
ignore empty sentence in tts stream adapter by @longcw in #2777
fix types for agents 1.2 by @longcw in #2778
fix MockTools type by @longcw in #2781
fix RunResult order of fnc_call & agent_handoff by @theomonnom in #2782
fix types by @theomonnom in #2783
fix tr_input by @theomonnom in #2784
fix GetEmailAgent instructions by @theomonnom in #2786
fix blingfire tokenizer test by @longcw in #2785
support tts with realtime model (audio in, text out) by @longcw in #2628
fix assistant message order on the RunResult by @theomonnom in #2787
fix FrontDeskAgent list_available_slots by @theomonnom in #2788
initial evals for the FrontDesk agent by @theomonnom in #2790
ignore empty assistant messages by @theomonnom in #2792
evals: add CI by @theomonnom in #2791
evals ci: use python 3.12 by @theomonnom in #2793
fix confirmation/validation ambiguity on GetEmailAgent instructions by @theomonnom in #2794
punctuation free turn detector by @jeradf in #2717
frontdesk: ToolError example by @theomonnom in #2808
evals API improvements by @theomonnom in #2846
make arguments optional for mock_tools by @theomonnom in #2847
allow returning Exception inside function tools by @theomonnom in #2848
add envvar to enable verbose evals logs by @theomonnom in #2849
preemptive generation before end of user turn by @longcw in #2728
fix next_event return type by @theomonnom in #2856
evals: add docstrings to the public API by @theomonnom in #2857
only print the judge result when verbose is enabled by @theomonnom in #2858
Add contains_agent_handoff assertion by @bcherry in #2862
allow editing SpeechHandle allow_interruptions & add RunContext.disallow_interruptions by @theomonnom in #2864
fix evals test by @theomonnom in #2865
fix ruff and types by @longcw in #2889
add opentelemetry trace by @longcw in #2873
fix unordered user messages by @theomonnom in #2891
fix livekit-agents 1.2 tests by @theomonnom in #2866
cleanup & prepare for release by @theomonnom in #2893
add prometheus by @theomonnom in #2908
add gen_ai attributes to llm_request by @longcw in #2905
fix types and aws realtime model by @longcw in #2910
fix TTS fallback adapter metrics_collected event by @longcw in #2890
add model property for llm plugins by @longcw in #2914
nit: mprove drivethru by @theomonnom in #2918
Removing ctx.connect() from examples by @sascotto in #2909
expose tokenizer option for cartesia tts by @longcw in #2916
remove openai prewarm by @theomonnom in #2919
add tts_audio_duration to usage metrics collection by @Panmax in #2915
...

@Panmax

What's Changed

fix log extra field handling in log.py by @Panmax in #2875
fix aws realtime model types by @longcw in #2877
chore: export PlayHandle type by @davidzhao in #2903
fix gemini realtime user transcription sent twice by @longcw in #2899
append framework ID to User-Agent Header by @BumaldaOverTheWater94 in #2896
add gemini tts (beta) by @longcw in #2834
fix DatastreamIO cancellation race by @theomonnom in #2911
DataStreamIO wait for start when capturing_frame by @theomonnom in #2912

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.6...livekit-agents@1.1.7

@zachoverflow

What's Changed

Fix LMNT plugin docs by @zachoverflow in #2762
Update new plugin readmes for format and links by @bcherry in #2571
fix update_chat_ctx bug by @BumaldaOverTheWater94 in #2763
Include item id when converting to LG messages by @dkeller-sondermind in #2767
fix schedule speech on windows when monotonic_ns resolution is rough by @longcw in #2770
Install optional dependencies during docs gen by @bcherry in #2766
Feat/mistralai plugins by @fabitokki in #2772
fix docker-compose typo by @theomonnom in #2789
suppress main_stream ended error in stt fallback adapter by @longcw in #2684
[fix] Fixed Orus voice name definition by @Is44m in #2797
fix aws sonic type checking by @longcw in #2804
fix deepgram stt docs by @longcw in #2803
Hotfix for Baseten STT by @htrivedi99 in #2801
fix inactive user instructions by @theomonnom in #2809
fix BackgroundAudio hanging on close error by @theomonnom in #2814
reset closing_ws for openai stt by @longcw in #2813
avoid sid error in console mode by @theomonnom in #2815
ignore livekit api when using console mode by @theomonnom in #2816
Feature : Add audio_mixer_kwargs to BackgroundAudioPlayer by @CyprienRicqueB2L in #2796
fix FunctionToolsExecutedEvent import by @longcw in #2832
feat: ability to use remote EOT inference when deployed in Cloud by @davidzhao in #2780
Add support for CustomPronunciations in Google TTS plugin by @kechako in #2692
Nova Sonic Example Agent by @BumaldaOverTheWater94 in #2817
Prevent console mode from crashing by @donalffons in #2853
Small fix to README by @kath0la in #2861
Fix: Use synchronized transcript for interrupted session.say() responses by @eliotsamuelmiller in #2843
fix aws sonic tools by @theomonnom in #2859
log metrics in extra by @theomonnom in #2868
accidentally omit a docstring by @BumaldaOverTheWater94 in #2869

New Contributors

@zachoverflow made their first contribution in #2762
@dkeller-sondermind made their first contribution in #2767
@fabitokki made their first contribution in #2772
@Is44m made their first contribution in #2797
@donalffons made their first contribution in #2853
@eliotsamuelmiller made their first contribution in #2843

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.5...livekit-agents@1.1.6

@arpesenti

What's Changed

Preserve original path when connecting to web socket (fix for #2700) by @arpesenti in #2702
disconnect room when session closed due to participant disconnected by @longcw in #2712
make sure audio_output.flush called when capture frame failed by @longcw in #2718
Update Inworld README by @ShayneP in #2723
Updating whisper API by @htrivedi99 in #2726
Lock google-genai package to stable v1.20.0 by @simplegr33n in #2725
fix(google): pass in raw schema according to genai 1.20 spec by @davidzhao in #2727
feat(google): expose seed parameter in LLM.chat by @mrkowalski in #2721
upgrade google genai to 1.23 by @longcw in #2743
support 11labs auto mode with sentence tokenizer by @longcw in #2744
add livekit-blingfire by @theomonnom in #2734
remove changesets by @theomonnom in #2749
uv: ignore blingfire by @theomonnom in #2750
fix aggregate-dumps when no file is present by @theomonnom in #2751
run tts tests on top10 providers by @theomonnom in #2752
delete changesets x2 by @theomonnom in #2753
add build CI by @theomonnom in #2754
fix blingfire build CI by @theomonnom in #2756
BlingFire: use Release config on Windows by @theomonnom in #2757
build blingfire for macos x86 & linux arm64 by @theomonnom in #2758
Nova Sonic Realtime Plugin by @BumaldaOverTheWater94 in #2740
keep aws nova sonic optional by @theomonnom in #2760

New Contributors

@arpesenti made their first contribution in #2702
@mrkowalski made their first contribution in #2721

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.4...livekit-agents@1.1.5

@theomonnom

What's Changed

add --ignore-changesets to update_versions.py by @theomonnom in #2665
remove frame_size_ms param when creating AudioStream by @longcw in #2667
Gladia STT - add new parameters to gladia stt by @mfernandez-gladia in #2649
expose automatic_function_calling config for google LLM by @longcw in #2675
start user away timer after user join by @longcw in #2676
preserve created_at timestamp when updating instructions by @Panmax in #2677
use parameters_json_schema for raw function tool with google LLM by @longcw in #2686
import TextInputEvent from room_io by @longcw in #2679
reset agent and user state after session closed by @longcw in #2691
Add hedra extra by @bcherry in #2705
add markdown filter for tts and transcription nodes by @longcw in #2695
Fix Example Typo by @toubatbrian in #2706
Inworld TTS by @davidzhao in #2693
add warning for deprecated speed and emotion control for cartesia tts by @longcw in #2708
fix(plugins-inworld): change default voice to Ashley by @MichaelSolati in #2707
deepgram: disable smart_format by default by @theomonnom in #2704
livekit-agents 1.1.4 by @theomonnom in #2709

New Contributors

@mfernandez-gladia made their first contribution in #2649
@Panmax made their first contribution in #2677
@MichaelSolati made their first contribution in #2707

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.2...livekit-agents@1.1.4

@temibabs

What's Changed

Add spitch optional dependency by @temibabs in #2559
add Cartesia STT usage event by @ChenghaoMou in #2565
use the cgroup cpu_count for the inference thread pool by @theomonnom in #2572
avoid possible contention on concurrent inference executions by @theomonnom in #2575
use onnx dynamic_block_base by @theomonnom in #2578
add vad for stt FallbackAdapter by @longcw in #2582
Don't require sarvam api key param for TTS by @bcherry in #2579
Remove unnecessary model param from baseten tts by @bcherry in #2568
Fix baseten STT api key lookup by @bcherry in #2576
fix stt fallback adapter imports by @longcw in #2590
Replace the office-ambience sound file by @bcherry in #2588
chore(deepgram,cartesia): removed AudioEnergyFilter by @davidzhao in #2594
unit tests for agent session by @longcw in #2518
fix unknown energy filter parameter by @theomonnom in #2599
fix type check by @longcw in #2596
wait for final transcript in manual turn detection by @longcw in #2597
add volume gain option by @jmugicagonz in #2603
increase audio frame size by @theomonnom in #2610
Add SSML support for Google TTS by @kechako in #2608
fix OpenAI Realtime connect timeout by @theomonnom in #2612
fix OpenAI Realtime tool_choice by @theomonnom in #2613
add transcript_confidence to ChatMessage by @theomonnom in #2611
fix(turn-detector): improve accuracy by combining adjacent turns by @davidzhao in #2595
fix transcription delay when VAD false negative by @longcw in #2620
Hume plugin fixes by @zgreathouse in #2591
Updating metrics for cached tokens for Realtime model (OpenAI) by @tg-bomze in #2621
Disable ensure_ascii by @tg-bomze in #2622
add timeout for agent session tests by @longcw in #2624
add error log when llm fallback adapter failed because chunk_sent by @longcw in #2626
fix ChatContext.insert type check by @theomonnom in #2635
Removes the split_utterances option from Hume TTS plugin by @zgreathouse in #2638
wait for video track from avatar plugins by @longcw in #2627
add http_options for gemini LLM and realtime model by @longcw in #2640
correctly passing speaking_rate to StreamingAudioConfig by @david-rodriguez in #2631
Fix : Increase audio mixer timeout by @CyprienRicqueB2L in #2646
handling multiple audio chunk output by @raghavjaistra in #2641
Fix Hume TTS by @bcherry in #2639
Update sarvam defaults, add 2.5 by @bcherry in #2618
fix tracing param in openai realtime by @longcw in #2652
raise error from gladia stt for fallback adapter and retry by @longcw in #2653
fix await tasks groups never return by @longcw in #2654
chore: add note for job_context.api usage by @davidzhao in #2655
fix(google): update dependency versions by @davidzhao in #2658
feat(baseten): add LLM module by @davidzhao in #2657
cleanup tee in agent activity by @longcw in #2660
fix duplicated audio on flush by @theomonnom in #2663
fix transcription sync warning when gemini no text output by @longcw in #2661
livekit-agents v1.1.2 by @theomonnom in #2664

New Contributors

@tg-bomze made their first contribution in #2621
@david-rodriguez made their first contribution in #2631
@CyprienRicqueB2L made their first contribution in #2646
@raghavjaistra made their first contribution in #2641

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.1.0...livekit-agents@1.1.2

@theomonnom

What's Changed

TTS improvements & tests by @theomonnom in #2152
rewrite Azure TTS by @theomonnom in #2151
fix google TTS by @theomonnom in #2410
add to_provider_format for ChatContext by @longcw in #2295
automatically close agent session when participant disconnected by @longcw in #2398
fix type checks and tts fallback adapter by @longcw in #2419
deprecate multi-segments SynthesizeStream by @theomonnom in #2421
cartesia: fix break by @theomonnom in #2422
avoid raising tts empty errors when pushed text is empty by @longcw in #2420
don't error when pushing on closed stream by @theomonnom in #2424
Add diarization support by @ShayneP in #2338
fix gemini user transcription when tool calls by @longcw in #2439
skip response if no llm set in user turn completed by @longcw in #2441
fix type checks for plugins by @longcw in #2423
Fix SpeechHandle Priority Schedule by @toubatbrian in #2433
use time.monotonic_ns for speech scheduling by @theomonnom in #2446
fix transcription sync when on_playback_finished missing after flush by @longcw in #2397
LMNT agent plugin for TTS synthesis by @naiveen in #2413
fix agent state for pipeline agent by @longcw in #2453
add max_session_duration and auto reconnection for OAI realtime api by @longcw in #2360
avatar publish video after waiting participant by @longcw in #2450
PlayAI plugin: fix language tag by @bryananderson in #2458
Update README.md by @theomonnom in #2466
fixed identifying streamable http mcp servers containing api key in url by @Akshay-a in #2468
fix(google): Live syncs context, supports manual turns by @davidzhao in #2401
AssemblyAI Remove Hardcoded Default Configuration by @dan-ince-aai in #2456
add duration_per_frame for datastream audio receiver by @longcw in #2474
add logs after session closed by @longcw in #2479
rename to frame_size_ms for data stream audio receiver by @longcw in #2481
chore(assemblyai): renaming to format_turns and only emit formatted f… by @dan-ince-aai in #2485
fix optional args in Annotated argument by @longcw in #2491
fix text only example by @longcw in #2490
add artificial delay between consecutive speech handles by @longcw in #2492
Support for Spitch in LiveKit by @temibabs in #2430
detect inactive user example by @theomonnom in #2499
recover from incorrect LLM arguments in function_tool by @theomonnom in #2500
add max_unrecoverable_errors and connection options for agent session by @longcw in #2494
Collect prompt cached tokens count in llm usage in AWS LLM plugin by @alfredguiaugment in #2508
fix tts fallback adapter test and stream adapter by @longcw in #2514
add hedra plugin by @longcw in #2163
Fix broken link in silero readme by @bcherry in #2521
fix(google): proactivity and affective_dialog require v1alpha1 API by @davidzhao in #2523
fix: LLM to honor custom timeouts by @davidzhao in #2526
ignore empty assistant messages by @theomonnom in #2530
feat(openai): strip thinking tokens by @davidzhao in #2524
Fix typo in cerebras error msg by @bcherry in #2531
feat: surface tavus conversation id by @mertgerdan in #2532
feat: langgraph integration by @davidzhao in #2534
cleanup bithuman when process shutdown by @longcw in #2536
add eleven labs v3 model by @choso in #2540
lmnt: Update default voice, add temperature, topp options by @naiveen in #2539
Add Cartesia STT integration by @DineshTeja in #2538
Baseten Livekit plugin integration by @htrivedi99 in #2520
feat: Sarvam.ai plugin for STT and TTS by @AnshTanwar in #2241
chore: tweaks to plugins CI by @davidzhao in #2543
use not given for room io options by @longcw in #2542
Add new speed and tracing options to OpenAI RealtimeModel and RealtimeSession by @mikevin920 in #2503
add connect options and error retry for realtime model by @longcw in #2544
initial prewarm by @theomonnom in #2527
bithuman avatar refresh token after prewarm by @longcw in #2541
ignore prewarm failures by @theomonnom in #2545
run room_io.start and ctx.connect concurrently in session.start by @longcw in #2505
convert TracingOptions for session updates by @theomonnom in #2546
use pyht SDK by @theomonnom in #2459
update x.ai models by @theomonnom in #2547
cancel tasks on start failure by @theomonnom in #2548
livekit-agents v1.1.0 by @theomonnom in #2549

New Contributors

@dan-ince-aai made their first contribution in #2399
@toubatbrian made their first contribution in #2433
@naiveen made their first contribution in #2413
@temibabs made their first contribution in #2430
@alfredguiaugment made their first contribution in #2508
@mertgerdan made their first contribution in #2532
@choso made their first contribution in #2540
@DineshTeja made their first contribution in #2538
@htrivedi99 made their first contribution in #2520
@AnshTanwar made their first contribution in #2241
@mikevin920 made their first contribution in #2503

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.0.23...livekit-agents@1.1.0

@koen-boost

What's Changed

Support empty array of parameters when using raw_schema by @koen-boost in #2328
fix: avoid shadowing name in function_tool decorator by @davidzhao in #2331
feat: Add with_letta OpenAI plugin by @mattzh72 in #2182
fix tool choice by @jayeshp19 in #2332
Expose all realtime model parameters by @Shubhrakanti in #2324
add prewarm to bithuman avatar example by @longcw in #2337
fix agent transcription truncate for console mode by @longcw in #2327
fix chat context item order by @longcw in #2321
google: add new models to LLM and live by @davidzhao in #2344
handle missing token_count in realtime usage metrics by @fredvollmer in #2350
[Rime] Increase timeout for arcana model to allow for synthesis of long audio by @MaCaki in #2343
google: do not error when empty responses are returned by @davidzhao in #2345
fix type check by @longcw in #2335
add internal worker token by @real-danm in #2354
Add input_audio_noise_reduction to OpenAI RealtimeModel by @RBT22 in #2362
Setting openai temperature on LLM.chat by @free-soellingeraj in #2353
on_end_of_turn is sync by @theomonnom in #2374
multilingual model update by @jeradf in #2219
ignore any_generics for mypy by @longcw in #2375
rename insert_item to insert by @theomonnom in #2372
support stt END_OF_SPEECH for stt turn detection by @longcw in #2363
Implemented #2379 - Add support for more paramaters for Google Live API by @F1nnM in #2380
disable split characters for tts by @longcw in #2366
google: fix proactive audio config, update genai by @davidzhao in #2390
fix race condition in avatar runner when reset playback_position by @longcw in #2396
set user state to away after a timeout by @longcw in #2408
add MCP support for streamable HTTP client by @Akshay-a in #2394
Upgrade AssemblyAI to Universal-Streaming by @dan-ince-aai in #2399
fix AssemblyAI & follow docs by @theomonnom in #2445

New Contributors

@koen-boost made their first contribution in #2328
@mattzh72 made their first contribution in #2182
@fredvollmer made their first contribution in #2350
@real-danm made their first contribution in #2354
@RBT22 made their first contribution in #2362
@free-soellingeraj made their first contribution in #2353
@F1nnM made their first contribution in #2380
@Akshay-a made their first contribution in #2394
@dan-ince-aai made their first contribution in #2399

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.0.22...livekit-agents@1.0.23

Patch Changes

update to livekit python 1.0 - 32e129ff1a4c3d28f363f4f2b2a355e29c8fe64d (@davidzhao)

Releases: livekit/agents

livekit-plugins-google@0.11.5

Patch Changes

Uh oh!

livekit-agents@1.2.0

New Features

Evals & Testing:

Preemptive Generation

Enhanced End-of-Turn (EOU) Detection

OpenTelemetry Integration

Experimental Agent Tasks

Half-Duplex Pipeline

Improved Transcription Synchronization

Upgraded Tokenization Engine

Complete changelog

Contributors

Uh oh!

livekit-agents@1.1.7

What's Changed

Contributors

Uh oh!

livekit-agents@1.1.6

What's Changed

New Contributors

Contributors

Uh oh!

livekit-agents@1.1.5

What's Changed

New Contributors

Contributors

Uh oh!

livekit-agents@1.1.4

What's Changed

New Contributors

Contributors

Uh oh!

livekit-agents@1.1.2

What's Changed

New Contributors

Contributors

Uh oh!

livekit-agents@1.1.0

What's Changed

New Contributors

Contributors

Uh oh!

livekit-agents@1.0.23

What's Changed

New Contributors

Contributors

Uh oh!

livekit-plugins-turn-detector@0.4.5

Patch Changes

Uh oh!