Chat completions stream returns immediately

> It is definitely embarrass to realise I was not using the WinML package the entire time, I definitely intended to. After changing my reference it definitely picks up the NPU models much like the CLI tool, it could download and load a model too per the example, though I am having issues with the Chat completions through code; the stream returns immediately without any chunks.
> 
> Updated project.cs
> ```
> <Project Sdk="Microsoft.NET.Sdk">
> 
>     <PropertyGroup>
>         <OutputType>Exe</OutputType>
>         <TargetFramework>net9.0-windows10.0.26100</TargetFramework>
>         <Nullable>enable</Nullable>
>         <Configurations>Release;Debug</Configurations>
>         <RootNamespace>app-name</RootNamespace>
>         <Platforms>x64</Platforms>
>         <ImplicitUsings>enable</ImplicitUsings>
>         <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
>         <WindowsPackageType>None</WindowsPackageType>
>         <EnableCoreMrtTooling>false</EnableCoreMrtTooling>
>     </PropertyGroup>
> 
>     <ItemGroup>
>         <PackageReference Include="Intel.ML.OnnxRuntime.OpenVino" Version="1.23.0" />
>         <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" />
>         <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" />
>         <PackageReference Include="OpenAI" Version="2.5.0" />
>     </ItemGroup>
> </Project>
> ```
> 
> This is how I am getting the chat client after loading a model (qwen2.5-coder-0.5b:npu)
> 
> ```
> // Get a chat client
> var chatClient = await model.GetChatClientAsync();
> 
> // Create a chat message
> List<ChatMessage> messages = new()
> {
>     new ChatMessage { Role = "user", Content = "What coding languages are you proficient in?" }
> };
> 
> 
> var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
> await foreach (var chunk in streamingResponse)
> {
>     Console.Write(chunk.Choices[0].Message.Content);
>     Console.Out.Flush();
> }
> Console.WriteLine();
> ```
> 
> I have tried the config with both the url for my local foundry server running and not and with and without the WebService config part:
> ```
> var config = new Configuration
> {
>     AppName = "app-name",
>     LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
>     Web = new Configuration.WebService
>     {
>         Urls = "http://127.0.0.1:55588"
>     }
> };
> ```
> 
> I have loaded the same model in the CLI tool and it loads and runs the chats fine. I am unsure what I am missing or doing wrong. 
> I appreciate that it has been some time since the previous response & it is the holiday season so we will all be taking breaks. 

 _Originally posted by @hovrawl in [#347](https://github.com/microsoft/Foundry-Local/issues/347#issuecomment-3695095495)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat completions stream returns immediately #413

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chat completions stream returns immediately #413

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions