Skip to content

[inworld] use audio/pcm to leverage the fast AudioBytesStream path#4803

Open
ianbbqzy wants to merge 1 commit intolivekit:mainfrom
ianbbqzy:ian/inworld-pcm
Open

[inworld] use audio/pcm to leverage the fast AudioBytesStream path#4803
ianbbqzy wants to merge 1 commit intolivekit:mainfrom
ianbbqzy:ian/inworld-pcm

Conversation

@ianbbqzy
Copy link
Contributor

  • strip wav header manually before pushing to emitter

@CLAassistant
Copy link

CLAassistant commented Feb 12, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

@ianbbqzy
Copy link
Contributor Author

cc @theomonnom @longcw. Thanks!

# AudioByteStream path instead of the async AudioStreamDecoder.
# WAV headers from the server are stripped before pushing to the
# emitter (see _strip_wav_header).
return "audio/pcm"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's wav file sent back.. you should just pass back audio/wav, our decoder has a fast path for decoding wav files.

this would be preferred rather than having multiple wav handling in the code base

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's the existing behavior. When encoding is LINEAR16, it falls to the else branch of audio/wav. But I measured that to have ~30-40ms additional latency.

Copy link
Contributor Author

@ianbbqzy ianbbqzy Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinalenguyen would you be able to advise here?

I tested with my own websocket benchmark script at https://github.com/inworld-ai/inworld-api-examples/tree/ian/livekit-integrations/integrations/livekit/python/benchmarks with 100+ iterations

summarized by AI, the reason is that

audio/pcm (raw) audio/wav (decoder)
Threading None — all in main loop ThreadPoolExecutor + StreamBuffer with locks
Cross-thread hops 0 2 (main→thread via StreamBuffer, thread→main via call_soon_threadsafe)
Event loop cycles to first frame 1 3+ (push → thread wake → thread decode → call_soon_threadsafe → decode_task scheduled → decode_task runs)
AudioByteStream instances 1 2 (one in _decode_wav_loop, one in _decode_task)
Buffer copies per read 0 StreamBuffer.read() creates new BytesIO + copies remaining bytes every read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When encoding is LINEAR16, it falls to the else branch of audio/wav. But I measured that to have ~30-40ms additional latency.

could you share the benchmark scripts that you had ran?

if AudioStreamDecoder is slow, we should optimize that instead. I still maintain that we should not be duplicating decoding logic within plugin code

Copy link
Contributor Author

@ianbbqzy ianbbqzy Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi David, can you access this script https://github.com/inworld-ai/inworld-api-examples/tree/ian/livekit-integrations/integrations/livekit/python/benchmarks ? I have the instructions to run in the README, you should be able to checkout this feature branch in the submodule to see the difference.

We might also consider a new Audio encoding format PCM for headerless audio bytes to differentiate from the current encoding format on the server side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments