microsoft · prathikr · Feb 23, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
diff --git a/.gitignore b/.gitignore
@@ -31,3 +31,4 @@ Cargo.lock
 bin/
 obj/
 /src/cs/samples/ConsoleClient/test.http
+logs/
diff --git a/sdk_v2/cs/GENERATE-DOCS.md b/sdk_v2/cs/GENERATE-DOCS.md
@@ -0,0 +1,45 @@
+# Generating API Reference Docs
+
+The `docs/api/` folder contains auto-generated markdown from the C# XML documentation comments. This guide explains how to regenerate them.
+
+## Prerequisites
+
+Install xmldoc2md as a global dotnet tool:
+
+```bash
+dotnet tool install -g XMLDoc2Markdown
+```
+
+## Steps
+
+### 1. Publish the SDK
+
+xmldoc2md needs the XML documentation file and all dependency DLLs in one folder. The project only generates the XML documentation file in **Release** mode (`-c Release`), so always publish with that configuration:
+
+```bash
+dotnet publish src/Microsoft.AI.Foundry.Local.csproj -c Release -o src/bin/publish
+```
+
+### 2. Generate the docs
+
+```bash
+dotnet xmldoc2md src/bin/publish/Microsoft.AI.Foundry.Local.dll --output docs/api --member-accessibility-level public
+```
+
+### All-in-one
+
+```powershell
+dotnet publish src/Microsoft.AI.Foundry.Local.csproj -c Release -o src/bin/publish
+dotnet xmldoc2md src/bin/publish/Microsoft.AI.Foundry.Local.dll --output docs/api --member-accessibility-level public
+```
+
+## Known Limitations
+
+xmldoc2md uses reflection metadata, which loses some C# language-level details:
+
+- **Nullable annotations stripped** — `Task<Model?>` renders as `Task<Model>`. The `<returns>` text documents nullability, but the generated signature does not show `?`.
+- **Record/init semantics lost** — Record types with `init`-only properties (e.g., `Runtime`, `ModelInfo`) are rendered with `{ get; set; }` instead of `{ get; init; }`.
+- **Default parameter values omitted** — Optional parameters like `CancellationToken? ct = null` appear without their defaults.
+- **Compiler-generated members surfaced** — Record types emit synthetic methods like `<Clone>$()`, `Equals(T)`, `GetHashCode()`, and `ToString()` that appear in the generated docs. These are not part of the intended public API and should be ignored.
+
+These are cosmetic issues in the generated docs. Always refer to the source code or IntelliSense for the authoritative API surface.
diff --git a/sdk_v2/cs/README.md b/sdk_v2/cs/README.md
@@ -1,59 +1,308 @@
 # Foundry Local C# SDK
 
-## Installation
+The Foundry Local C# SDK provides a .NET interface for running AI models locally via the Foundry Local Core. Discover, download, load, and run inference entirely on your own machine — no cloud required.
+
+## Features
 
-To use the Foundry Local C# SDK, you need to install the NuGet package:
+- **Model catalog** — browse and search all available models; filter by cached or loaded state
+- **Lifecycle management** — download, load, unload, and remove models programmatically
+- **Chat completions** — synchronous and `IAsyncEnumerable` streaming via OpenAI-compatible types
+- **Audio transcription** — transcribe audio files with streaming support
+- **Download progress** — wire up an `Action<float>` callback for real-time download percentage
+- **Model variants** — select specific hardware/quantization variants per model alias
+- **Optional web service** — start an OpenAI-compatible REST endpoint (`/v1/chat_completions`, `/v1/models`)
+- **WinML acceleration** — opt-in Windows hardware acceleration with automatic EP download
+- **Full async/await** — every operation supports `CancellationToken` and async patterns
+- **IDisposable** — deterministic cleanup of native resources
+
+## Installation
 
 ```bash
 dotnet add package Microsoft.AI.Foundry.Local
 ```
 
 ### Building from source
-To build the SDK, run the following command in your terminal:
 
 ```bash
-cd sdk/cs
-dotnet build sdk_v2/cs/src/Microsoft.AI.Foundry.Local.csproj
+cd sdk_v2/cs
+dotnet build src/Microsoft.AI.Foundry.Local.csproj
 ```
 
-You can also load [FoundryLocal.sln](./FoundryLocal.sln) in Visual Studio 2022 or VSCode.
+Or open [Microsoft.AI.Foundry.Local.SDK.sln](./Microsoft.AI.Foundry.Local.SDK.sln) in Visual Studio / VS Code.
 
-## Usage
+## WinML: Automatic Hardware Acceleration (Windows)
+
+On Windows, Foundry Local can leverage WinML for GPU/NPU hardware acceleration via ONNX Runtime execution providers (EPs). EPs are large binaries downloaded on first use and cached for subsequent runs.
+
+Install the WinML package variant instead:
+
+```bash
+dotnet add package Microsoft.AI.Foundry.Local.WinML
+```
+
+Or build from source with:
+
+```bash
+dotnet build src/Microsoft.AI.Foundry.Local.csproj /p:UseWinML=true
+```
 
-> [!NOTE]
-> For this example, you'll need the OpenAI Nuget package installed as well:
-> ```bash
-> dotnet add package OpenAI
-> ```
+### Triggering EP download
+
+EP download can be time-consuming. Call `EnsureEpsDownloadedAsync` early (after initialization) to separate the download step from catalog access:
+
+```csharp
+// Initialize the manager first (see Quick Start)
+await FoundryLocalManager.CreateAsync(
+    new Configuration { AppName = "my-app" },
+    NullLogger.Instance);
+
+await FoundryLocalManager.Instance.EnsureEpsDownloadedAsync();
+
+// Now catalog access won't trigger an EP download
+var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
+```
+
+If you skip this step, EPs are downloaded automatically the first time you access the catalog. Once cached, subsequent calls are fast.
+
+## Quick Start
 
 ```csharp
 using Microsoft.AI.Foundry.Local;
-using OpenAI;
-using OpenAI.Chat;
-using System.ClientModel;
-using System.Diagnostics.Metrics;
+using Microsoft.Extensions.Logging;
+using Microsoft.Extensions.Logging.Abstractions;
+using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
+
+// 1. Initialize the singleton manager
+await FoundryLocalManager.CreateAsync(
+    new Configuration { AppName = "my-app" },
+    NullLogger.Instance);
+
+// 2. Get the model catalog and look up a model
+var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
+var model = await catalog.GetModelAsync("phi-3.5-mini")
+    ?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");
+
+// 3. Download (if needed) and load the model
+await model.DownloadAsync();
+await model.LoadAsync();
+
+// 4. Get a chat client and run inference
+var chatClient = await model.GetChatClientAsync();
+var response = await chatClient.CompleteChatAsync(new[]
+{
+    ChatMessage.FromUser("Why is the sky blue?")
+});
+
+Console.WriteLine(response.Choices![0].Message.Content);
+
+// 5. Clean up
+FoundryLocalManager.Instance.Dispose();
+```
+
+## Usage
+
+### Initialization
+
+`FoundryLocalManager` is an async singleton. Call `CreateAsync` once at startup:
+
+```csharp
+await FoundryLocalManager.CreateAsync(
+    new Configuration { AppName = "my-app" },
+    loggerFactory.CreateLogger("FoundryLocal"));
+```
+
+Access it anywhere afterward via `FoundryLocalManager.Instance`. Check `FoundryLocalManager.IsInitialized` to verify creation.
+
+### Catalog
+
+The catalog lists all models known to the Foundry Local Core:
+
+```csharp
+var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
+
+// List all available models
+var models = await catalog.ListModelsAsync();
+foreach (var m in models)
+    Console.WriteLine($"{m.Alias} — {m.SelectedVariant.Info.DisplayName}");
+
+// Get a specific model by alias
+var model = await catalog.GetModelAsync("phi-3.5-mini")
+    ?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");
+
+// Get a specific variant by its unique model ID
+var variant = await catalog.GetModelVariantAsync("phi-3.5-mini-generic-gpu-4")
+    ?? throw new Exception("Variant 'phi-3.5-mini-generic-gpu-4' not found in catalog.");
+
+// List models already downloaded to the local cache
+var cached = await catalog.GetCachedModelsAsync();
+
+// List models currently loaded in memory
+var loaded = await catalog.GetLoadedModelsAsync();
+```
+
+### Model Lifecycle
+
+Each `Model` wraps one or more `ModelVariant` entries (different quantizations, hardware targets). The SDK auto-selects the best variant, or you can pick one:
+
+```csharp
+// Check and select variants
+Console.WriteLine($"Selected: {model.SelectedVariant.Id}");
+foreach (var v in model.Variants)
+    Console.WriteLine($"  {v.Id} (cached: {await v.IsCachedAsync()})");
+
+// Switch to a different variant
+model.SelectVariant(model.Variants[1]);
+```
+
+Download, load, and unload:
 
-var alias = "phi-3.5-mini";
+```csharp
+// Download with progress reporting
+await model.DownloadAsync(progress =>
+    Console.WriteLine($"Download: {progress:F1}%"));
+
+// Load into memory
+await model.LoadAsync();
 
-var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias);
+// Unload when done
+await model.UnloadAsync();
+
+// Remove from local cache entirely
+await model.RemoveFromCacheAsync();
+```
+
+### Chat Completions
+
+```csharp
+var chatClient = await model.GetChatClientAsync();
 
-var model = await manager.GetModelInfoAsync(aliasOrModelId: alias);
-ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey);
-OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
+var response = await chatClient.CompleteChatAsync(new[]
 {
-    Endpoint = manager.Endpoint
+    ChatMessage.FromSystem("You are a helpful assistant."),
+    ChatMessage.FromUser("Explain async/await in C#.")
 });
 
-var chatClient = client.GetChatClient(model?.ModelId);
+Console.WriteLine(response.Choices![0].Message.Content);
+```
+
+#### Streaming
 
-var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'");
+Use `IAsyncEnumerable` for token-by-token output:
+
+```csharp
+using var cts = new CancellationTokenSource();
 
-Console.Write($"[ASSISTANT]: ");
-foreach (var completionUpdate in completionUpdates)
+await foreach (var chunk in chatClient.CompleteChatStreamingAsync(
+    new[] { ChatMessage.FromUser("Write a haiku about .NET") }, cts.Token))
 {
-    if (completionUpdate.ContentUpdate.Count > 0)
-    {
-        Console.Write(completionUpdate.ContentUpdate[0].Text);
-    }
+    Console.Write(chunk.Choices?[0]?.Delta?.Content);
+}
+```
+
+#### Chat Settings
+
+Tune generation parameters per client:
+
+```csharp
+chatClient.Settings.Temperature = 0.7f;
+chatClient.Settings.MaxTokens = 256;
+chatClient.Settings.TopP = 0.9f;
+chatClient.Settings.FrequencyPenalty = 0.5f;
+```
+
+### Audio Transcription
+
+```csharp
+var audioClient = await model.GetAudioClientAsync();
+
+// One-shot transcription
+var result = await audioClient.TranscribeAudioAsync("recording.mp3");
+Console.WriteLine(result.Text);
+
+// Streaming transcription
+await foreach (var chunk in audioClient.TranscribeAudioStreamingAsync("recording.mp3", CancellationToken.None))
+{
+    Console.Write(chunk.Text);
 }
 ```
+
+#### Audio Settings
+
+```csharp
+audioClient.Settings.Language = "en";
+audioClient.Settings.Temperature = 0.0f;
+```
+
+### Web Service
+
+Start an OpenAI-compatible REST endpoint for use by external tools or processes:
+
+```csharp
+// Configure the web service URL in your Configuration
+await FoundryLocalManager.CreateAsync(
+    new Configuration
+    {
+        AppName = "my-app",
+        Web = new Configuration.WebService { Urls = "http://127.0.0.1:5000" }
+    },
+    NullLogger.Instance);
+
+await FoundryLocalManager.Instance.StartWebServiceAsync();
+Console.WriteLine($"Listening on: {string.Join(", ", FoundryLocalManager.Instance.Urls!)}");
+
+// ... use the service ...
+
+await FoundryLocalManager.Instance.StopWebServiceAsync();
+```
+
+### Configuration
+
+| Property | Type | Default | Description |
+|---|---|---|---|
+| `AppName` | `string` | **(required)** | Your application name |
+| `AppDataDir` | `string?` | `~/.{AppName}` | Application data directory |
+| `ModelCacheDir` | `string?` | `{AppDataDir}/cache/models` | Where models are stored locally |
+| `LogsDir` | `string?` | `{AppDataDir}/logs` | Log output directory |
+| `LogLevel` | `LogLevel` | `Warning` | `Verbose`, `Debug`, `Information`, `Warning`, `Error`, `Fatal` |
+| `Web` | `WebService?` | `null` | Web service configuration (see below) |
+| `AdditionalSettings` | `IDictionary<string, string>?` | `null` | Extra key-value settings passed to Core |
+
+**`Configuration.WebService`**
+
+| Property | Type | Default | Description |
+|---|---|---|---|
+| `Urls` | `string?` | `127.0.0.1:0` | Bind address; semi-colon separated for multiple |
+| `ExternalUrl` | `Uri?` | `null` | URI for accessing the web service in a separate process |
+
+### Disposal
+
+`FoundryLocalManager` implements `IDisposable`. Dispose stops the web service (if running) and releases native resources:
+
+```csharp
+FoundryLocalManager.Instance.Dispose();
+```
+
+## API Reference
+
+Auto-generated API docs live in [`docs/api/`](./docs/api/). See [`GENERATE-DOCS.md`](./GENERATE-DOCS.md) to regenerate.
+
+Key types:
+
+| Type | Description |
+|---|---|
+| [`FoundryLocalManager`](./docs/api/microsoft.ai.foundry.local.foundrylocalmanager.md) | Singleton entry point — create, catalog, web service |
+| [`Configuration`](./docs/api/microsoft.ai.foundry.local.configuration.md) | Initialization settings |
+| [`ICatalog`](./docs/api/microsoft.ai.foundry.local.icatalog.md) | Model catalog interface |
+| [`Model`](./docs/api/microsoft.ai.foundry.local.model.md) | Model with variant selection |
+| [`ModelVariant`](./docs/api/microsoft.ai.foundry.local.modelvariant.md) | Specific model variant (hardware/quantization) |
+| [`OpenAIChatClient`](./docs/api/microsoft.ai.foundry.local.openaichatclient.md) | Chat completions (sync + streaming) |
+| [`OpenAIAudioClient`](./docs/api/microsoft.ai.foundry.local.openaiaudioclient.md) | Audio transcription (sync + streaming) |
+| [`ModelInfo`](./docs/api/microsoft.ai.foundry.local.modelinfo.md) | Full model metadata record |
+
+## Tests
+
+```bash
+dotnet test
+```
+
+See [`test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md`](./test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md) for prerequisites and local model setup.
-Original file line number
+Diff line change
@@ Expand Up / @@ -31,3 +31,4 @@ Cargo.lock @@
     bin/
     obj/
     /src/cs/samples/ConsoleClient/test.http
+    logs/