Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ Cargo.lock
bin/
obj/
/src/cs/samples/ConsoleClient/test.http
logs/
45 changes: 45 additions & 0 deletions sdk_v2/cs/GENERATE-DOCS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Generating API Reference Docs

The `docs/api/` folder contains auto-generated markdown from the C# XML documentation comments. This guide explains how to regenerate them.

## Prerequisites

Install xmldoc2md as a global dotnet tool:

```bash
dotnet tool install -g XMLDoc2Markdown
```

## Steps

### 1. Publish the SDK

xmldoc2md needs the XML documentation file and all dependency DLLs in one folder. The project only generates the XML documentation file in **Release** mode (`-c Release`), so always publish with that configuration:

```bash
dotnet publish src/Microsoft.AI.Foundry.Local.csproj -c Release -o src/bin/publish
```

### 2. Generate the docs

```bash
dotnet xmldoc2md src/bin/publish/Microsoft.AI.Foundry.Local.dll --output docs/api --member-accessibility-level public
```

### All-in-one

```powershell
dotnet publish src/Microsoft.AI.Foundry.Local.csproj -c Release -o src/bin/publish
dotnet xmldoc2md src/bin/publish/Microsoft.AI.Foundry.Local.dll --output docs/api --member-accessibility-level public
```

## Known Limitations

xmldoc2md uses reflection metadata, which loses some C# language-level details:

- **Nullable annotations stripped** — `Task<Model?>` renders as `Task<Model>`. The `<returns>` text documents nullability, but the generated signature does not show `?`.
- **Record/init semantics lost** — Record types with `init`-only properties (e.g., `Runtime`, `ModelInfo`) are rendered with `{ get; set; }` instead of `{ get; init; }`.
- **Default parameter values omitted** — Optional parameters like `CancellationToken? ct = null` appear without their defaults.
- **Compiler-generated members surfaced** — Record types emit synthetic methods like `<Clone>$()`, `Equals(T)`, `GetHashCode()`, and `ToString()` that appear in the generated docs. These are not part of the intended public API and should be ignored.

These are cosmetic issues in the generated docs. Always refer to the source code or IntelliSense for the authoritative API surface.
309 changes: 279 additions & 30 deletions sdk_v2/cs/README.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,308 @@
# Foundry Local C# SDK

## Installation
The Foundry Local C# SDK provides a .NET interface for running AI models locally via the Foundry Local Core. Discover, download, load, and run inference entirely on your own machine — no cloud required.

## Features

To use the Foundry Local C# SDK, you need to install the NuGet package:
- **Model catalog** — browse and search all available models; filter by cached or loaded state
- **Lifecycle management** — download, load, unload, and remove models programmatically
- **Chat completions** — synchronous and `IAsyncEnumerable` streaming via OpenAI-compatible types
- **Audio transcription** — transcribe audio files with streaming support
- **Download progress** — wire up an `Action<float>` callback for real-time download percentage
- **Model variants** — select specific hardware/quantization variants per model alias
- **Optional web service** — start an OpenAI-compatible REST endpoint (`/v1/chat_completions`, `/v1/models`)
- **WinML acceleration** — opt-in Windows hardware acceleration with automatic EP download
- **Full async/await** — every operation supports `CancellationToken` and async patterns
- **IDisposable** — deterministic cleanup of native resources

## Installation

```bash
dotnet add package Microsoft.AI.Foundry.Local
```

### Building from source
To build the SDK, run the following command in your terminal:

```bash
cd sdk/cs
dotnet build sdk_v2/cs/src/Microsoft.AI.Foundry.Local.csproj
cd sdk_v2/cs
dotnet build src/Microsoft.AI.Foundry.Local.csproj
```

You can also load [FoundryLocal.sln](./FoundryLocal.sln) in Visual Studio 2022 or VSCode.
Or open [Microsoft.AI.Foundry.Local.SDK.sln](./Microsoft.AI.Foundry.Local.SDK.sln) in Visual Studio / VS Code.

## Usage
## WinML: Automatic Hardware Acceleration (Windows)

On Windows, Foundry Local can leverage WinML for GPU/NPU hardware acceleration via ONNX Runtime execution providers (EPs). EPs are large binaries downloaded on first use and cached for subsequent runs.

Install the WinML package variant instead:

```bash
dotnet add package Microsoft.AI.Foundry.Local.WinML
```

Or build from source with:

```bash
dotnet build src/Microsoft.AI.Foundry.Local.csproj /p:UseWinML=true
```

> [!NOTE]
> For this example, you'll need the OpenAI Nuget package installed as well:
> ```bash
> dotnet add package OpenAI
> ```
### Triggering EP download

EP download can be time-consuming. Call `EnsureEpsDownloadedAsync` early (after initialization) to separate the download step from catalog access:

```csharp
// Initialize the manager first (see Quick Start)
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
NullLogger.Instance);

await FoundryLocalManager.Instance.EnsureEpsDownloadedAsync();

// Now catalog access won't trigger an EP download
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
```

If you skip this step, EPs are downloaded automatically the first time you access the catalog. Once cached, subsequent calls are fast.

## Quick Start

```csharp
using Microsoft.AI.Foundry.Local;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;

// 1. Initialize the singleton manager
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
NullLogger.Instance);

// 2. Get the model catalog and look up a model
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
var model = await catalog.GetModelAsync("phi-3.5-mini")
?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");

// 3. Download (if needed) and load the model
await model.DownloadAsync();
await model.LoadAsync();

// 4. Get a chat client and run inference
var chatClient = await model.GetChatClientAsync();
var response = await chatClient.CompleteChatAsync(new[]
{
ChatMessage.FromUser("Why is the sky blue?")
});

Console.WriteLine(response.Choices![0].Message.Content);

// 5. Clean up
FoundryLocalManager.Instance.Dispose();
```

## Usage

### Initialization

`FoundryLocalManager` is an async singleton. Call `CreateAsync` once at startup:

```csharp
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
loggerFactory.CreateLogger("FoundryLocal"));
```

Access it anywhere afterward via `FoundryLocalManager.Instance`. Check `FoundryLocalManager.IsInitialized` to verify creation.

### Catalog

The catalog lists all models known to the Foundry Local Core:

```csharp
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();

// List all available models
var models = await catalog.ListModelsAsync();
foreach (var m in models)
Console.WriteLine($"{m.Alias} — {m.SelectedVariant.Info.DisplayName}");

// Get a specific model by alias
var model = await catalog.GetModelAsync("phi-3.5-mini")
?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");

// Get a specific variant by its unique model ID
var variant = await catalog.GetModelVariantAsync("phi-3.5-mini-generic-gpu-4")
?? throw new Exception("Variant 'phi-3.5-mini-generic-gpu-4' not found in catalog.");

// List models already downloaded to the local cache
var cached = await catalog.GetCachedModelsAsync();

// List models currently loaded in memory
var loaded = await catalog.GetLoadedModelsAsync();
```

### Model Lifecycle

Each `Model` wraps one or more `ModelVariant` entries (different quantizations, hardware targets). The SDK auto-selects the best variant, or you can pick one:

```csharp
// Check and select variants
Console.WriteLine($"Selected: {model.SelectedVariant.Id}");
foreach (var v in model.Variants)
Console.WriteLine($" {v.Id} (cached: {await v.IsCachedAsync()})");

// Switch to a different variant
model.SelectVariant(model.Variants[1]);
```

Download, load, and unload:

var alias = "phi-3.5-mini";
```csharp
// Download with progress reporting
await model.DownloadAsync(progress =>
Console.WriteLine($"Download: {progress:F1}%"));

// Load into memory
await model.LoadAsync();

var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias);
// Unload when done
await model.UnloadAsync();

// Remove from local cache entirely
await model.RemoveFromCacheAsync();
```

### Chat Completions

```csharp
var chatClient = await model.GetChatClientAsync();

var model = await manager.GetModelInfoAsync(aliasOrModelId: alias);
ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey);
OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
var response = await chatClient.CompleteChatAsync(new[]
{
Endpoint = manager.Endpoint
ChatMessage.FromSystem("You are a helpful assistant."),
ChatMessage.FromUser("Explain async/await in C#.")
});

var chatClient = client.GetChatClient(model?.ModelId);
Console.WriteLine(response.Choices![0].Message.Content);
```

#### Streaming

var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'");
Use `IAsyncEnumerable` for token-by-token output:

```csharp
using var cts = new CancellationTokenSource();

Console.Write($"[ASSISTANT]: ");
foreach (var completionUpdate in completionUpdates)
await foreach (var chunk in chatClient.CompleteChatStreamingAsync(
new[] { ChatMessage.FromUser("Write a haiku about .NET") }, cts.Token))
{
if (completionUpdate.ContentUpdate.Count > 0)
{
Console.Write(completionUpdate.ContentUpdate[0].Text);
}
Console.Write(chunk.Choices?[0]?.Delta?.Content);
}
```

#### Chat Settings

Tune generation parameters per client:

```csharp
chatClient.Settings.Temperature = 0.7f;
chatClient.Settings.MaxTokens = 256;
chatClient.Settings.TopP = 0.9f;
chatClient.Settings.FrequencyPenalty = 0.5f;
```

### Audio Transcription

```csharp
var audioClient = await model.GetAudioClientAsync();

// One-shot transcription
var result = await audioClient.TranscribeAudioAsync("recording.mp3");
Console.WriteLine(result.Text);

// Streaming transcription
await foreach (var chunk in audioClient.TranscribeAudioStreamingAsync("recording.mp3", CancellationToken.None))
{
Console.Write(chunk.Text);
}
```

#### Audio Settings

```csharp
audioClient.Settings.Language = "en";
audioClient.Settings.Temperature = 0.0f;
```

### Web Service

Start an OpenAI-compatible REST endpoint for use by external tools or processes:

```csharp
// Configure the web service URL in your Configuration
await FoundryLocalManager.CreateAsync(
new Configuration
{
AppName = "my-app",
Web = new Configuration.WebService { Urls = "http://127.0.0.1:5000" }
},
NullLogger.Instance);

await FoundryLocalManager.Instance.StartWebServiceAsync();
Console.WriteLine($"Listening on: {string.Join(", ", FoundryLocalManager.Instance.Urls!)}");

// ... use the service ...

await FoundryLocalManager.Instance.StopWebServiceAsync();
```

### Configuration

| Property | Type | Default | Description |
|---|---|---|---|
| `AppName` | `string` | **(required)** | Your application name |
| `AppDataDir` | `string?` | `~/.{AppName}` | Application data directory |
| `ModelCacheDir` | `string?` | `{AppDataDir}/cache/models` | Where models are stored locally |
| `LogsDir` | `string?` | `{AppDataDir}/logs` | Log output directory |
| `LogLevel` | `LogLevel` | `Warning` | `Verbose`, `Debug`, `Information`, `Warning`, `Error`, `Fatal` |
| `Web` | `WebService?` | `null` | Web service configuration (see below) |
| `AdditionalSettings` | `IDictionary<string, string>?` | `null` | Extra key-value settings passed to Core |

**`Configuration.WebService`**

| Property | Type | Default | Description |
|---|---|---|---|
| `Urls` | `string?` | `127.0.0.1:0` | Bind address; semi-colon separated for multiple |
| `ExternalUrl` | `Uri?` | `null` | URI for accessing the web service in a separate process |

### Disposal

`FoundryLocalManager` implements `IDisposable`. Dispose stops the web service (if running) and releases native resources:

```csharp
FoundryLocalManager.Instance.Dispose();
```

## API Reference

Auto-generated API docs live in [`docs/api/`](./docs/api/). See [`GENERATE-DOCS.md`](./GENERATE-DOCS.md) to regenerate.

Key types:

| Type | Description |
|---|---|
| [`FoundryLocalManager`](./docs/api/microsoft.ai.foundry.local.foundrylocalmanager.md) | Singleton entry point — create, catalog, web service |
| [`Configuration`](./docs/api/microsoft.ai.foundry.local.configuration.md) | Initialization settings |
| [`ICatalog`](./docs/api/microsoft.ai.foundry.local.icatalog.md) | Model catalog interface |
| [`Model`](./docs/api/microsoft.ai.foundry.local.model.md) | Model with variant selection |
| [`ModelVariant`](./docs/api/microsoft.ai.foundry.local.modelvariant.md) | Specific model variant (hardware/quantization) |
| [`OpenAIChatClient`](./docs/api/microsoft.ai.foundry.local.openaichatclient.md) | Chat completions (sync + streaming) |
| [`OpenAIAudioClient`](./docs/api/microsoft.ai.foundry.local.openaiaudioclient.md) | Audio transcription (sync + streaming) |
| [`ModelInfo`](./docs/api/microsoft.ai.foundry.local.modelinfo.md) | Full model metadata record |

## Tests

```bash
dotnet test
```

See [`test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md`](./test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md) for prerequisites and local model setup.
Loading