Streaming: delay in GenerateAsync, or incorrect use? #475

MetacodeX7 · 2024-11-14T14:26:22Z

MetacodeX7
Nov 14, 2024

Hi,

I use the code below to perform a streamed generation. The model is OpenAI, with the streaming setting enabled on it as well. The code works, but I notice a 3-10 second delay before the first response is processed. The delay is proportional to the size of the LLM output.

My project is in Blazor WASM, so I was able to confirm this - in the browser debug tools, I can see the response from OpenAI coming in right away and the size of it increases progressively - OpenAI is streaming the response correctly. But it's only when it's completely done that I can start iterating over the partial responses.

Could anyone confirm whether I made a mistake in my code, or if this would be an issue with the GenerateAsync code? The readme mentions how to do simple generation queries, but I haven't seen an example with streaming.

ChatRequest chatRequest = PrepareChatRequest(request);
ChatSettings chatSettings = new ChatSettings();
chatSettings.UseStreaming = true;
                
IAsyncEnumerable<ChatResponse> asyncResponses = chatModel.GenerateAsync(chatRequest, chatSettings);
await foreach(ChatResponse partialResponse in asyncResponses)
{
   // My code
}

Answered by MetacodeX7

Apr 11, 2025

I looked into this again, but I kept hitting a wall.
OpenAI streaming requests do work without delay from a console app, and from an asp.net server. It's not a bug in LangChain.
I tried making changes to my HttpClient setup (to make sure blazor's SetBrowserResponseStreamingEnabled is called), but that didn't improve the situation.

In the end Grok 3 told me that OpenAI doesn't really support direct browser requests with streaming, if that's true then I have been wasting my time trying to make it work! I just wanted to be able to start my blazor wasm app and make LLM direct requests without going through a server, for efficient development purposes.

If that's incorrect or someone's been abl…

View full answer

HavenDV · 2024-11-14T14:30:17Z

HavenDV
Nov 14, 2024
Maintainer

The code looks correct
I'll check the distribution of responses over time in our tests, maybe that will give some clues

0 replies

HavenDV · 2024-11-14T16:06:15Z

HavenDV
Nov 14, 2024
Maintainer

Everything seems correct on the latest version in main:

00:00:00.1032343. RequestSent: Human: Answer me five hundreds random words

00:00:01.4456246. DeltaReceived: 
00:00:01.4460775. DeltaReceived: Sure
00:00:01.4463671. DeltaReceived: !
00:00:01.4463854. DeltaReceived:  Here
...
00:00:36.8203745. DeltaReceived: 499
00:00:36.9014720. DeltaReceived: .
00:00:36.9015002. DeltaReceived:  Mosaic

Also please use latest prerelease version to ensure we testing same things

3 replies

MetacodeX7 Nov 16, 2024
Author

Thanks for checking! I tried the prerelease, no luck. I modified my code to run almost exactly the same logic as the tests, but no luck either. I guess there is a more complex problem on my end, maybe an issue with .net 7? I will investigate more, and compare different setups. Thank you for your help.

steveberdy Jan 8, 2025

@MetacodeX7 did you try using events instead of awaiting the IAsyncEnumerable?

MetacodeX7 Apr 11, 2025
Author

@MetacodeX7 did you try using events instead of awaiting the IAsyncEnumerable?

Thanks for the suggestion, and sorry for the delay, it took me a while to look into this again. I did switch to events later on, but this did not improve the situation. I think LLM providers just don't support streaming requests directly from a browser app. As I am new to this type of dev environment, I didn't think this would be an issue.

MetacodeX7 · 2025-04-11T15:28:47Z

MetacodeX7
Apr 11, 2025
Author

I looked into this again, but I kept hitting a wall.
OpenAI streaming requests do work without delay from a console app, and from an asp.net server. It's not a bug in LangChain.
I tried making changes to my HttpClient setup (to make sure blazor's SetBrowserResponseStreamingEnabled is called), but that didn't improve the situation.

In the end Grok 3 told me that OpenAI doesn't really support direct browser requests with streaming, if that's true then I have been wasting my time trying to make it work! I just wanted to be able to start my blazor wasm app and make LLM direct requests without going through a server, for efficient development purposes.

If that's incorrect or someone's been able to make it work, please let me know, I am very interested.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Streaming: delay in GenerateAsync, or incorrect use? #475

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Streaming: delay in GenerateAsync, or incorrect use? #475

Uh oh!

MetacodeX7 Nov 14, 2024

Replies: 3 comments · 3 replies

Uh oh!

HavenDV Nov 14, 2024 Maintainer

Uh oh!

Uh oh!

HavenDV Nov 14, 2024 Maintainer

Uh oh!

MetacodeX7 Nov 16, 2024 Author

Uh oh!

steveberdy Jan 8, 2025

Uh oh!

MetacodeX7 Apr 11, 2025 Author

Uh oh!

MetacodeX7 Apr 11, 2025 Author

MetacodeX7
Nov 14, 2024

Replies: 3 comments 3 replies

HavenDV
Nov 14, 2024
Maintainer

HavenDV
Nov 14, 2024
Maintainer

MetacodeX7 Nov 16, 2024
Author

MetacodeX7 Apr 11, 2025
Author

MetacodeX7
Apr 11, 2025
Author