[inworld] use audio/pcm to leverage the fast AudioBytesStream path#4803
[inworld] use audio/pcm to leverage the fast AudioBytesStream path#4803ianbbqzy wants to merge 1 commit intolivekit:mainfrom
Conversation
ianbbqzy
commented
Feb 12, 2026
- strip wav header manually before pushing to emitter
f279099 to
43325a9
Compare
|
cc @theomonnom @longcw. Thanks! |
| # AudioByteStream path instead of the async AudioStreamDecoder. | ||
| # WAV headers from the server are stripped before pushing to the | ||
| # emitter (see _strip_wav_header). | ||
| return "audio/pcm" |
There was a problem hiding this comment.
if it's wav file sent back.. you should just pass back audio/wav, our decoder has a fast path for decoding wav files.
this would be preferred rather than having multiple wav handling in the code base
There was a problem hiding this comment.
I believe that's the existing behavior. When encoding is LINEAR16, it falls to the else branch of audio/wav. But I measured that to have ~30-40ms additional latency.
There was a problem hiding this comment.
@tinalenguyen would you be able to advise here?
I tested with my own websocket benchmark script at https://github.com/inworld-ai/inworld-api-examples/tree/ian/livekit-integrations/integrations/livekit/python/benchmarks with 100+ iterations
summarized by AI, the reason is that
audio/pcm (raw) |
audio/wav (decoder) |
|
|---|---|---|
| Threading | None — all in main loop | ThreadPoolExecutor + StreamBuffer with locks |
| Cross-thread hops | 0 | 2 (main→thread via StreamBuffer, thread→main via call_soon_threadsafe) |
| Event loop cycles to first frame | 1 | 3+ (push → thread wake → thread decode → call_soon_threadsafe → decode_task scheduled → decode_task runs) |
| AudioByteStream instances | 1 | 2 (one in _decode_wav_loop, one in _decode_task) |
| Buffer copies per read | 0 | StreamBuffer.read() creates new BytesIO + copies remaining bytes every read |
There was a problem hiding this comment.
When encoding is LINEAR16, it falls to the else branch of audio/wav. But I measured that to have ~30-40ms additional latency.
could you share the benchmark scripts that you had ran?
if AudioStreamDecoder is slow, we should optimize that instead. I still maintain that we should not be duplicating decoding logic within plugin code
There was a problem hiding this comment.
Hi David, can you access this script https://github.com/inworld-ai/inworld-api-examples/tree/ian/livekit-integrations/integrations/livekit/python/benchmarks ? I have the instructions to run in the README, you should be able to checkout this feature branch in the submodule to see the difference.
We might also consider a new Audio encoding format PCM for headerless audio bytes to differentiate from the current encoding format on the server side
43325a9 to
74a4f88
Compare