Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
87068d5
add interface draft
chenghao-mou Feb 5, 2026
e0d5ec1
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou Mar 6, 2026
8eebccc
draft
chenghao-mou Mar 11, 2026
f92fbc0
fix type issues
chenghao-mou Mar 11, 2026
d1086ff
refactor stream to support turn detector protocol
chenghao-mou Mar 12, 2026
0a02bb1
minor fixes
chenghao-mou Mar 12, 2026
168d0d7
minor fixes
chenghao-mou Mar 12, 2026
277db6e
WIP: use only ws stream
chenghao-mou Mar 24, 2026
03c0e2e
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou Mar 24, 2026
56b4796
fix uv.lock bad merge
chenghao-mou Mar 24, 2026
be9a550
WIP: more refactoring
chenghao-mou Mar 25, 2026
601229c
fix mypy
chenghao-mou Mar 25, 2026
c4d92f8
remove temp url
chenghao-mou Mar 25, 2026
e963d85
disable turn detection when agent is still speaking
chenghao-mou Mar 25, 2026
c529d79
minor refactoring
chenghao-mou Mar 29, 2026
09baed8
fix type issues
chenghao-mou Mar 29, 2026
3830638
wip
chenghao-mou Apr 10, 2026
f214aa0
clean up encoder
chenghao-mou Apr 20, 2026
c922f44
wip
chenghao-mou Apr 20, 2026
f94a0dd
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou Apr 20, 2026
604bfdc
update protos
chenghao-mou Apr 21, 2026
f9ec64a
minor fixes
chenghao-mou Apr 21, 2026
ddbf594
address comments
chenghao-mou Apr 21, 2026
d465564
add text fallback
chenghao-mou Apr 22, 2026
6e7d6bf
add text fallback
chenghao-mou Apr 22, 2026
200d634
fix threshold
chenghao-mou Apr 22, 2026
dbd11b0
remove temp deps
chenghao-mou Apr 22, 2026
60004dd
support realtime model
chenghao-mou Apr 22, 2026
6de53f4
fix type issues
chenghao-mou Apr 22, 2026
4ed8a82
add id in logs
chenghao-mou Apr 23, 2026
0db57ea
use threaded audio encoder
chenghao-mou Apr 24, 2026
bbcfc3a
close encoder
chenghao-mou Apr 24, 2026
7e04332
update dep
chenghao-mou Apr 27, 2026
04db92f
address comments
chenghao-mou Apr 30, 2026
46fd3bf
add cloud agent worker token
chenghao-mou Apr 30, 2026
e4e8ef6
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou Apr 30, 2026
fc94068
fix type issues
chenghao-mou Apr 30, 2026
999edd5
add token in header instead
chenghao-mou Apr 30, 2026
cde90de
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou Apr 30, 2026
3603f04
wip
chenghao-mou May 13, 2026
6272402
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 13, 2026
3bc3ff3
refactor for the cloud model
chenghao-mou May 14, 2026
a08b624
add support for both v1 and v1-mini
chenghao-mou May 14, 2026
f435571
fix example
chenghao-mou May 15, 2026
8e75d60
address comments
chenghao-mou May 15, 2026
cf54cbe
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 15, 2026
4f10a69
address comments
chenghao-mou May 15, 2026
e96f1be
clean up session _on_error annotation
chenghao-mou May 15, 2026
97400d2
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 15, 2026
b1e9294
merge inference and local eot code
chenghao-mou May 15, 2026
49f0de0
update tests
chenghao-mou May 17, 2026
7fe2bfb
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 17, 2026
8b150aa
clean up
chenghao-mou May 17, 2026
28af3f5
minor refactor and clean up
chenghao-mou May 18, 2026
75ddae6
refactor
chenghao-mou May 19, 2026
76cec5d
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 19, 2026
2ccf54d
refactor
chenghao-mou May 19, 2026
7fbca08
clean up
chenghao-mou May 19, 2026
82c599a
refactor
chenghao-mou May 19, 2026
4b6fdb5
clean up
chenghao-mou May 19, 2026
7500160
more refactoring
chenghao-mou May 19, 2026
efe8d5c
fix makefile indentation
chenghao-mou May 19, 2026
3237f9d
update protocol
chenghao-mou May 20, 2026
09cdb0c
add direct commit for late stt transcripts
chenghao-mou May 20, 2026
f02d24e
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 21, 2026
0a6a66d
add local inference dependency
chenghao-mou May 21, 2026
80fbc29
update deps
chenghao-mou May 21, 2026
21310ee
use local inference pacakge and deprecate the turn detector package
chenghao-mou May 23, 2026
5408ae1
feat(vad): bundle optimized silero vad and deprecate the plugin (#5800)
chenghao-mou May 25, 2026
442d857
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 25, 2026
9d9cb52
fix type issue
chenghao-mou May 25, 2026
2b4cc7e
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 25, 2026
a42cf7a
refactoring
chenghao-mou May 25, 2026
f592a16
drop duplicate calls and simplify triggers
chenghao-mou May 25, 2026
82ad113
fix vad restore bug
chenghao-mou May 25, 2026
f93f7ca
adjust thresholds for the cloud model
chenghao-mou May 25, 2026
7ff1eb1
update warning message
chenghao-mou May 25, 2026
e9b8a1e
Merge branch 'main' into feat/AGT-2520-multimodal-EOU
chenghao-mou May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ agents that can see, hear, and understand.
To install the core Agents library, along with plugins for popular model providers:

```bash
pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]"
pip install "livekit-agents[openai,silero,deepgram,cartesia]"
```

## Docs and guides
Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ To run the examples, you'll need:

- A [LiveKit Cloud](https://cloud.livekit.io) account or a local [LiveKit server](https://github.com/livekit/livekit)
- API keys for the model providers you want to use in a `.env` file
- Python 3.9 or higher
- Python 3.10 or higher
- [uv](https://docs.astral.sh/uv/)

### Environment file
Expand Down
2 changes: 0 additions & 2 deletions examples/avatar_agents/audio_wave/agent_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from livekit.agents.voice.avatar import DataStreamAudioOutput
from livekit.agents.voice.io import PlaybackFinishedEvent, PlaybackStartedEvent
from livekit.agents.voice.room_io import ATTRIBUTE_PUBLISH_ON_BEHALF
from livekit.plugins import silero

load_dotenv()

Expand Down Expand Up @@ -75,7 +74,6 @@ async def entrypoint(ctx: JobContext):
stt=inference.STT("deepgram/nova-3"),
llm=inference.LLM("google/gemini-2.5-flash"),
tts=inference.TTS("cartesia/sonic-3"),
vad=silero.VAD.load(),
resume_false_interruption=False,
)

Expand Down
7 changes: 3 additions & 4 deletions examples/avatar_agents/keyframe/agent_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
function_tool,
inference,
)
from livekit.plugins import keyframe, silero
from livekit.agents.inference import AudioTurnDetector
from livekit.plugins import keyframe
from livekit.plugins.keyframe import Emotion
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv()

Expand Down Expand Up @@ -51,8 +51,7 @@ async def entrypoint(ctx: JobContext):
llm=inference.LLM("google/gemini-2.5-flash"),
tts=inference.TTS("cartesia/sonic-3"),
resume_false_interruption=False,
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
turn_detection=AudioTurnDetector(),
)

avatar = keyframe.AvatarSession(persona_slug="public:cosmo_persona-1.5-live")
Expand Down
6 changes: 2 additions & 4 deletions examples/drive-thru/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@
function_tool,
inference,
)
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.agents.inference import AudioTurnDetector

load_dotenv()

Expand Down Expand Up @@ -487,8 +486,7 @@ async def drive_thru_agent(ctx: JobContext) -> None:
),
llm=inference.LLM("openai/gpt-5-mini"),
tts=inference.TTS("cartesia/sonic-3", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02"),
turn_detection=MultilingualModel(),
vad=silero.VAD.load(),
turn_detection=AudioTurnDetector(),
max_tool_steps=10,
)

Expand Down
6 changes: 2 additions & 4 deletions examples/frontdesk/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@
task_completion_judge,
tool_use_judge,
)
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.agents.inference import AudioTurnDetector

load_dotenv()

Expand Down Expand Up @@ -266,8 +265,7 @@ async def frontdesk_agent(ctx: JobContext):
stt=inference.STT("deepgram/nova-3"),
llm=inference.LLM("google/gemini-2.5-flash"),
tts=inference.TTS("cartesia/sonic-3", voice="39b376fc-488e-4d0c-8b37-e00b72059fdd"),
turn_detection=MultilingualModel(),
vad=silero.VAD.load(),
turn_detection=AudioTurnDetector(),
max_tool_steps=1,
)

Expand Down
2 changes: 0 additions & 2 deletions examples/healthcare/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@
WarmTransferTask,
)
from livekit.agents.llm import ToolError, function_tool
from livekit.plugins import silero

logger = logging.getLogger("HealthcareAgent")

Expand Down Expand Up @@ -754,7 +753,6 @@ async def entrypoint(ctx: JobContext):
stt=inference.STT("deepgram/nova-3", language="multi"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("inworld/inworld-tts-1"),
vad=silero.VAD.load(),
preemptive_generation=True,
)

Expand Down
2 changes: 0 additions & 2 deletions examples/inference/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
cli,
inference,
)
from livekit.plugins import silero
from livekit.rtc import RpcInvocationData

logger = logging.getLogger("inference")
Expand Down Expand Up @@ -59,7 +58,6 @@ async def entrypoint(ctx: JobContext) -> None:
stt=inference.STT(model=DEFAULT_STT),
llm=inference.LLM(model=DEFAULT_LLM),
tts=inference.TTS(model=DEFAULT_TTS),
vad=silero.VAD.load(),
)

def parse_value(payload: str, fallback: str) -> str:
Expand Down
12 changes: 2 additions & 10 deletions examples/other/elevenlab_scribe_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

from dotenv import load_dotenv

from livekit.agents import Agent, AgentServer, AgentSession, JobContext, JobProcess, cli, inference
from livekit.plugins import elevenlabs, silero
from livekit.agents import Agent, AgentServer, AgentSession, JobContext, cli, inference
from livekit.plugins import elevenlabs

logger = logging.getLogger("realtime-scribe-v2")
logger.setLevel(logging.INFO)
Expand All @@ -14,13 +14,6 @@
server = AgentServer()


def prewarm(proc: JobProcess) -> None:
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext) -> None:
# Using ElevenLabs STT plugin directly for realtime mode support
Expand All @@ -37,7 +30,6 @@ async def entrypoint(ctx: JobContext) -> None:

session: AgentSession = AgentSession(
allow_interruptions=True,
vad=ctx.proc.userdata["vad"],
stt=stt,
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3"),
Expand Down
19 changes: 9 additions & 10 deletions examples/other/kokoro_tts.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,16 @@

from dotenv import load_dotenv

from livekit.agents import Agent, AgentServer, AgentSession, JobContext, JobProcess, cli, metrics
from livekit.agents import (
Agent,
AgentServer,
AgentSession,
JobContext,
cli,
metrics,
)
from livekit.agents.voice import MetricsCollectedEvent
from livekit.plugins import deepgram, openai, silero
from livekit.plugins import deepgram, openai

logger = logging.getLogger("kokoro-tts-agent")

Expand All @@ -27,13 +34,6 @@ def __init__(self) -> None:
server = AgentServer()


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
# each log entry will include these fields
Expand All @@ -42,7 +42,6 @@ async def entrypoint(ctx: JobContext):
"user_id": "your user_id",
}
session = AgentSession(
vad=ctx.proc.userdata["vad"],
# any combination of STT, LLM, TTS, or realtime API can be used
llm=openai.LLM(model="gpt-4.1-mini"),
stt=deepgram.STT(model="nova-3", language="multi"),
Expand Down
12 changes: 1 addition & 11 deletions examples/other/transcription/multi-user-transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,13 @@
AgentSession,
AutoSubscribe,
JobContext,
JobProcess,
StopResponse,
cli,
inference,
llm,
room_io,
utils,
)
from livekit.plugins import silero

load_dotenv()

Expand Down Expand Up @@ -91,9 +89,7 @@ async def _start_session(self, participant: rtc.RemoteParticipant) -> AgentSessi
if participant.identity in self._sessions:
return self._sessions[participant.identity]

session = AgentSession(
vad=self.ctx.proc.userdata["vad"],
)
session = AgentSession()
await session.start(
agent=Transcriber(
participant_identity=participant.identity,
Expand Down Expand Up @@ -136,11 +132,5 @@ async def cleanup():
ctx.add_shutdown_callback(cleanup)


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm

if __name__ == "__main__":
cli.run_app(server)
5 changes: 3 additions & 2 deletions examples/other/transcription/transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@
MetricsCollectedEvent,
StopResponse,
cli,
inference,
llm,
metrics,
room_io,
)
from livekit.plugins import openai, silero
from livekit.plugins import openai

load_dotenv()

Expand Down Expand Up @@ -50,7 +51,7 @@ async def entrypoint(ctx: JobContext):

session = AgentSession(
# vad is needed for non-streaming STT implementations
vad=silero.VAD.load(min_silence_duration=0.3),
vad=inference.VAD(model="silero", min_silence_duration=0.3),
)

@session.on("metrics_collected")
Expand Down
3 changes: 1 addition & 2 deletions examples/other/transcription/translator.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
room_io,
utils,
)
from livekit.plugins import openai, silero
from livekit.plugins import openai

load_dotenv()

Expand Down Expand Up @@ -76,7 +76,6 @@ async def entrypoint(ctx: JobContext):

session = AgentSession(
# vad is only needed for non-streaming STT implementations
vad=silero.VAD.load(),
)

@session.on("metrics_collected")
Expand Down
5 changes: 3 additions & 2 deletions examples/primitives/echo-agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
AutoSubscribe,
JobContext,
cli,
inference,
)
from livekit.agents.vad import VADEventType
from livekit.plugins import silero

load_dotenv()
logger = logging.getLogger("echo-agent")
Expand All @@ -35,7 +35,8 @@ async def entrypoint(ctx: JobContext):
participant=participant,
track_source=rtc.TrackSource.SOURCE_MICROPHONE,
)
vad = silero.VAD.load(
vad = inference.VAD(
model="silero",
min_speech_duration=0.2,
min_silence_duration=0.6,
)
Expand Down
6 changes: 2 additions & 4 deletions examples/survey/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@
room_io,
)
from livekit.agents.beta.workflows import GetEmailTask, TaskGroup
from livekit.agents.inference import AudioTurnDetector
from livekit.agents.llm import function_tool
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("SurveyAgent")

Expand Down Expand Up @@ -354,8 +353,7 @@ async def entrypoint(ctx: JobContext):
llm=inference.LLM("google/gemini-2.5-flash"),
stt=inference.STT("deepgram/nova-3", language="multi"),
tts=inference.TTS("inworld/inworld-tts-1"),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
turn_detection=AudioTurnDetector(),
preemptive_generation=True,
)

Expand Down
14 changes: 2 additions & 12 deletions examples/telephony/amd.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,10 @@
AgentServer,
AgentSession,
JobContext,
JobProcess,
cli,
inference,
)
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.agents.inference import AudioTurnDetector

logger = logging.getLogger("basic-agent")

Expand All @@ -38,13 +36,6 @@ def __init__(self) -> None:
server = AgentServer()


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {
Expand All @@ -54,8 +45,7 @@ async def entrypoint(ctx: JobContext):
stt=inference.STT("deepgram/nova-3", language="multi"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
turn_detection=MultilingualModel(),
vad=ctx.proc.userdata["vad"],
turn_detection=AudioTurnDetector(),
preemptive_generation=True,
)

Expand Down
10 changes: 0 additions & 10 deletions examples/telephony/bank-ivr/ivr_navigator_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,13 @@
AgentServer,
AgentSession,
JobContext,
JobProcess,
MetricsCollectedEvent,
RunContext,
cli,
inference,
metrics,
)
from livekit.agents.llm.tool_context import function_tool
from livekit.plugins import silero

logger = logging.getLogger("phone-tree-agent")

Expand Down Expand Up @@ -76,13 +74,6 @@ async def record_task_result_and_hang_up(self, context: RunContext, content: str
context.session.shutdown(drain=True)


def prewarm(proc: JobProcess) -> None:
proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session(agent_name=PHONE_TREE_AGENT_DISPATCH_NAME)
async def dtmf_session(ctx: JobContext) -> None:
await ctx.connect()
Expand All @@ -91,7 +82,6 @@ async def dtmf_session(ctx: JobContext) -> None:
}

session: AgentSession = AgentSession(
vad=ctx.proc.userdata["vad"],
llm=inference.LLM("openai/gpt-4.1"),
stt=inference.STT("deepgram/nova-3"),
tts=inference.TTS("rime/arcana"),
Expand Down
Loading
Loading