Skip to content

Chat completions stream returns immediately #413

@samuel100

Description

@samuel100

It is definitely embarrass to realise I was not using the WinML package the entire time, I definitely intended to. After changing my reference it definitely picks up the NPU models much like the CLI tool, it could download and load a model too per the example, though I am having issues with the Chat completions through code; the stream returns immediately without any chunks.

Updated project.cs

<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net9.0-windows10.0.26100</TargetFramework>
        <Nullable>enable</Nullable>
        <Configurations>Release;Debug</Configurations>
        <RootNamespace>app-name</RootNamespace>
        <Platforms>x64</Platforms>
        <ImplicitUsings>enable</ImplicitUsings>
        <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
        <WindowsPackageType>None</WindowsPackageType>
        <EnableCoreMrtTooling>false</EnableCoreMrtTooling>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="Intel.ML.OnnxRuntime.OpenVino" Version="1.23.0" />
        <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" />
        <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" />
        <PackageReference Include="OpenAI" Version="2.5.0" />
    </ItemGroup>
</Project>

This is how I am getting the chat client after loading a model (qwen2.5-coder-0.5b:npu)

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Create a chat message
List<ChatMessage> messages = new()
{
    new ChatMessage { Role = "user", Content = "What coding languages are you proficient in?" }
};


var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    Console.Write(chunk.Choices[0].Message.Content);
    Console.Out.Flush();
}
Console.WriteLine();

I have tried the config with both the url for my local foundry server running and not and with and without the WebService config part:

var config = new Configuration
{
    AppName = "app-name",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
    Web = new Configuration.WebService
    {
        Urls = "http://127.0.0.1:55588"
    }
};

I have loaded the same model in the CLI tool and it loads and runs the chats fine. I am unsure what I am missing or doing wrong.
I appreciate that it has been some time since the previous response & it is the holiday season so we will all be taking breaks.

Originally posted by @hovrawl in #347

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions