Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions python/samples/02-agents/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This directory groups provider-specific samples for Agent Framework.
| [`custom/`](custom/) | Framework extensibility samples for building custom `BaseAgent` and `BaseChatClient` implementations, including layer-composition guidance. |
| [`foundry/`](foundry/) | Microsoft Foundry and Foundry Local samples using `FoundryChatClient`, `FoundryAgent`, `RawFoundryAgentChatClient`, and `FoundryLocalClient` for hosted agents, Responses API, local inference, tools, MCP, and sessions. |
| [`github_copilot/`](github_copilot/) | `GitHubCopilotAgent` samples showing basic usage, session handling, permission-scoped shell/file/url access, and MCP integration. |
| [`mlflow_gateway/`](mlflow_gateway/) | MLflow AI Gateway samples using `OpenAIChatClient` configured to route through the gateway's OpenAI-compatible endpoint for unified multi-provider access. |
| [`ollama/`](ollama/) | Local Ollama samples using `OllamaChatClient` (recommended) plus OpenAI-compatible Ollama setup, including reasoning and multimodal examples. |
| [`openai/`](openai/) | OpenAI provider samples for Chat and Chat Completion clients, including tools, structured output, sessions, MCP, web search, and multimodal tasks. |

Expand Down
69 changes: 69 additions & 0 deletions python/samples/02-agents/providers/mlflow_gateway/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# MLflow AI Gateway Examples

This folder contains examples demonstrating how to use the [MLflow AI Gateway](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) with the Agent Framework.

## What is MLflow AI Gateway?

MLflow AI Gateway (MLflow ≥ 3.0) is a database-backed LLM proxy built into the MLflow tracking server. It provides a unified API across multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more — with built-in:

- **Secrets management** — provider API keys stored encrypted on the server
- **Fallback & retry** — automatic failover to backup models on failure
- **Traffic splitting** — A/B test by routing percentages of requests to different models
- **Budget tracking** — per-endpoint or per-user token budgets
- **Usage tracing** — every call logged as an MLflow trace automatically

All gateway features are configured through the MLflow UI. Your application code stays the same regardless of which underlying LLM provider the gateway routes to.

## Prerequisites

1. **Install MLflow** (using [`uv`](https://docs.astral.sh/uv/), which Agent Framework uses):

```bash
uv pip install 'mlflow[genai]'
```

Or run it directly with `uvx` (no install needed):

```bash
uvx --from 'mlflow[genai]' mlflow server --host 127.0.0.1 --port 5000
```

2. **Start the MLflow server** (if you didn't use `uvx` above):

```bash
mlflow server --host 127.0.0.1 --port 5000
```

3. **Create a gateway endpoint** in the MLflow UI at [http://localhost:5000](http://localhost:5000). Navigate to **AI Gateway → Create Endpoint**, select a provider (e.g., OpenAI) and model (e.g., `gpt-4o-mini`), and enter your provider API key. The key is stored encrypted on the server.

See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/endpoints/) for details on endpoint configuration.

## Recommended Approach

Since MLflow AI Gateway exposes an OpenAI-compatible endpoint at `/gateway/openai/v1`, you can connect Agent Framework to it using the existing `OpenAIChatClient` with a custom `base_url` — no extra packages required beyond the OpenAI integration.

## Examples

| File | Description |
|------|-------------|
| [`mlflow_gateway_with_openai_chat_client.py`](mlflow_gateway_with_openai_chat_client.py) | Connect an Agent Framework agent to MLflow AI Gateway via the OpenAI-compatible endpoint. Shows both streaming and non-streaming responses with tool calling. |

Comment on lines +47 to +50
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples table is malformed markdown (each row starts with ||), so it won’t render as a 2‑column table. Use standard table syntax with a single leading | per row (matching other provider READMEs).

Copilot uses AI. Check for mistakes.
## Configuration

Set the following environment variables before running the example:

- `MLFLOW_GATEWAY_ENDPOINT`: The base URL for the gateway's OpenAI-compatible endpoint (must include the `/gateway/openai/v1/` suffix)
- Example: `export MLFLOW_GATEWAY_ENDPOINT="http://localhost:5000/gateway/openai/v1/"`

- `MLFLOW_GATEWAY_MODEL`: The gateway endpoint name you created in the MLflow UI
- Example: `export MLFLOW_GATEWAY_MODEL="my-chat-endpoint"`

## Switching Providers Without Code Changes

A key benefit of using MLflow AI Gateway is that you can change the underlying LLM provider by reconfiguring the gateway endpoint in the MLflow UI — your Agent Framework code stays the same. For example, the same agent can route to:

- An OpenAI-backed endpoint for production
- An Anthropic-backed endpoint for fallback
- A local Ollama-backed endpoint for development

All controlled by the gateway's endpoint configuration.
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Copyright (c) Microsoft. All rights reserved.

import asyncio
import os
import sys
from random import randint
from typing import Annotated

from agent_framework import Agent, tool
from agent_framework.openai import OpenAIChatClient
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

"""
MLflow AI Gateway with OpenAI Chat Client Example

This sample demonstrates routing Agent Framework requests through the
MLflow AI Gateway using the OpenAI-compatible passthrough endpoint.

MLflow AI Gateway (MLflow >= 3.0) is a database-backed LLM proxy that
provides a unified API across multiple providers (OpenAI, Anthropic,
Gemini, Mistral, Bedrock, Ollama, and more) with built-in secrets
management, fallback/retry, traffic splitting, and budget tracking.
Provider API keys are stored encrypted on the server.

Setup:
pip install mlflow[genai]
mlflow server --host 127.0.0.1 --port 5000

Then create a gateway endpoint in the MLflow UI at http://localhost:5000
under AI Gateway -> Create Endpoint, select a provider and model, and
enter your provider API key.

Environment Variables:
- MLFLOW_GATEWAY_ENDPOINT: Base URL for the gateway's OpenAI-compatible
endpoint (e.g., "http://localhost:5000/gateway/openai/v1/")
- MLFLOW_GATEWAY_MODEL: The gateway endpoint name you created in the
MLflow UI (e.g., "my-chat-endpoint")

See: https://mlflow.org/docs/latest/genai/governance/ai-gateway/
"""


def _require_env(name: str) -> str:
"""Read a required env var; exit with a clear error if missing or empty.

Without this check, an empty MLFLOW_GATEWAY_ENDPOINT would cause
OpenAIChatClient to silently fall back to OpenAI's public endpoint and
forward prompts there.
"""
value = os.getenv(name)
if not value:
sys.exit(
f"Error: {name} is not set. See the README in this folder for setup "
"instructions: https://mlflow.org/docs/latest/genai/governance/ai-gateway/"
)
return value


# NOTE: approval_mode="never_require" is for sample brevity. Use "always_require" in production;
# see samples/02-agents/tools/function_tool_with_approval.py
# and samples/02-agents/tools/function_tool_with_approval_and_sessions.py.
@tool(approval_mode="never_require")
def get_weather(
location: Annotated[str, "The location to get the weather for."],
) -> str:
"""Get the weather for a given location."""
conditions = ["sunny", "cloudy", "rainy", "stormy"]
return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."


async def non_streaming_example(base_url: str, model: str) -> None:
"""Example of non-streaming response (get the complete result at once)."""
print("=== Non-streaming Response Example ===")

_client = OpenAIChatClient(
api_key="unused", # Provider keys are managed by the MLflow server
base_url=base_url,
model=model,
)
Comment on lines +78 to +82
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_url comes from os.getenv("MLFLOW_GATEWAY_ENDPOINT"); if it’s unset (or empty), OpenAIChatClient will fall back to the default OpenAI base URL and may send prompts to OpenAI unexpectedly. Consider validating MLFLOW_GATEWAY_ENDPOINT (and MLFLOW_GATEWAY_MODEL) up front and failing fast with a clear message before constructing the client.

Copilot uses AI. Check for mistakes.
agent = Agent(
client=_client,
name="WeatherAgent",
instructions="You are a helpful weather agent.",
tools=[get_weather],
)

query = "What's the weather like in Seattle?"
print(f"User: {query}")
result = await agent.run(query)
print(f"Agent: {result}\n")


async def streaming_example(base_url: str, model: str) -> None:
"""Example of streaming response (get results as they are generated)."""
print("=== Streaming Response Example ===")

_client = OpenAIChatClient(
api_key="unused", # Provider keys are managed by the MLflow server
base_url=base_url,
model=model,
)
Comment on lines +100 to +104
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: if MLFLOW_GATEWAY_ENDPOINT isn’t set, the client may default to OpenAI’s public endpoint. Validate required env vars once (e.g., in main()) and pass the resolved values into both examples to avoid accidental misrouting.

Copilot uses AI. Check for mistakes.
agent = Agent(
client=_client,
name="WeatherAgent",
instructions="You are a helpful weather agent.",
tools=[get_weather],
)

query = "What's the weather like in Portland?"
print(f"User: {query}")
print("Agent: ", end="", flush=True)
async for chunk in agent.run(query, stream=True):
if chunk.text:
print(chunk.text, end="", flush=True)
print("\n")


async def main() -> None:
print("=== MLflow AI Gateway with OpenAI Chat Client Agent Example ===")

# Validate required env vars upfront so we never silently route to OpenAI's
# public endpoint if MLFLOW_GATEWAY_ENDPOINT is missing or empty.
base_url = _require_env("MLFLOW_GATEWAY_ENDPOINT")
model = _require_env("MLFLOW_GATEWAY_MODEL")

await non_streaming_example(base_url, model)
await streaming_example(base_url, model)


if __name__ == "__main__":
asyncio.run(main())