diff --git a/python/samples/02-agents/providers/README.md b/python/samples/02-agents/providers/README.md index 6ab5fa9d76..eae5ec6a4e 100644 --- a/python/samples/02-agents/providers/README.md +++ b/python/samples/02-agents/providers/README.md @@ -11,6 +11,7 @@ This directory groups provider-specific samples for Agent Framework. | [`custom/`](custom/) | Framework extensibility samples for building custom `BaseAgent` and `BaseChatClient` implementations, including layer-composition guidance. | | [`foundry/`](foundry/) | Microsoft Foundry and Foundry Local samples using `FoundryChatClient`, `FoundryAgent`, `RawFoundryAgentChatClient`, and `FoundryLocalClient` for hosted agents, Responses API, local inference, tools, MCP, and sessions. | | [`github_copilot/`](github_copilot/) | `GitHubCopilotAgent` samples showing basic usage, session handling, permission-scoped shell/file/url access, and MCP integration. | +| [`mlflow_gateway/`](mlflow_gateway/) | MLflow AI Gateway samples using `OpenAIChatClient` configured to route through the gateway's OpenAI-compatible endpoint for unified multi-provider access. | | [`ollama/`](ollama/) | Local Ollama samples using `OllamaChatClient` (recommended) plus OpenAI-compatible Ollama setup, including reasoning and multimodal examples. | | [`openai/`](openai/) | OpenAI provider samples for Chat and Chat Completion clients, including tools, structured output, sessions, MCP, web search, and multimodal tasks. | diff --git a/python/samples/02-agents/providers/mlflow_gateway/README.md b/python/samples/02-agents/providers/mlflow_gateway/README.md new file mode 100644 index 0000000000..74b086d29d --- /dev/null +++ b/python/samples/02-agents/providers/mlflow_gateway/README.md @@ -0,0 +1,69 @@ +# MLflow AI Gateway Examples + +This folder contains examples demonstrating how to use the [MLflow AI Gateway](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) with the Agent Framework. + +## What is MLflow AI Gateway? + +MLflow AI Gateway (MLflow ≥ 3.0) is a database-backed LLM proxy built into the MLflow tracking server. It provides a unified API across multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more — with built-in: + +- **Secrets management** — provider API keys stored encrypted on the server +- **Fallback & retry** — automatic failover to backup models on failure +- **Traffic splitting** — A/B test by routing percentages of requests to different models +- **Budget tracking** — per-endpoint or per-user token budgets +- **Usage tracing** — every call logged as an MLflow trace automatically + +All gateway features are configured through the MLflow UI. Your application code stays the same regardless of which underlying LLM provider the gateway routes to. + +## Prerequisites + +1. **Install MLflow** (using [`uv`](https://docs.astral.sh/uv/), which Agent Framework uses): + + ```bash + uv pip install 'mlflow[genai]' + ``` + + Or run it directly with `uvx` (no install needed): + + ```bash + uvx --from 'mlflow[genai]' mlflow server --host 127.0.0.1 --port 5000 + ``` + +2. **Start the MLflow server** (if you didn't use `uvx` above): + + ```bash + mlflow server --host 127.0.0.1 --port 5000 + ``` + +3. **Create a gateway endpoint** in the MLflow UI at [http://localhost:5000](http://localhost:5000). Navigate to **AI Gateway → Create Endpoint**, select a provider (e.g., OpenAI) and model (e.g., `gpt-4o-mini`), and enter your provider API key. The key is stored encrypted on the server. + + See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/endpoints/) for details on endpoint configuration. + +## Recommended Approach + +Since MLflow AI Gateway exposes an OpenAI-compatible endpoint at `/gateway/openai/v1`, you can connect Agent Framework to it using the existing `OpenAIChatClient` with a custom `base_url` — no extra packages required beyond the OpenAI integration. + +## Examples + +| File | Description | +|------|-------------| +| [`mlflow_gateway_with_openai_chat_client.py`](mlflow_gateway_with_openai_chat_client.py) | Connect an Agent Framework agent to MLflow AI Gateway via the OpenAI-compatible endpoint. Shows both streaming and non-streaming responses with tool calling. | + +## Configuration + +Set the following environment variables before running the example: + +- `MLFLOW_GATEWAY_ENDPOINT`: The base URL for the gateway's OpenAI-compatible endpoint (must include the `/gateway/openai/v1/` suffix) + - Example: `export MLFLOW_GATEWAY_ENDPOINT="http://localhost:5000/gateway/openai/v1/"` + +- `MLFLOW_GATEWAY_MODEL`: The gateway endpoint name you created in the MLflow UI + - Example: `export MLFLOW_GATEWAY_MODEL="my-chat-endpoint"` + +## Switching Providers Without Code Changes + +A key benefit of using MLflow AI Gateway is that you can change the underlying LLM provider by reconfiguring the gateway endpoint in the MLflow UI — your Agent Framework code stays the same. For example, the same agent can route to: + +- An OpenAI-backed endpoint for production +- An Anthropic-backed endpoint for fallback +- A local Ollama-backed endpoint for development + +All controlled by the gateway's endpoint configuration. diff --git a/python/samples/02-agents/providers/mlflow_gateway/mlflow_gateway_with_openai_chat_client.py b/python/samples/02-agents/providers/mlflow_gateway/mlflow_gateway_with_openai_chat_client.py new file mode 100644 index 0000000000..18ab355b91 --- /dev/null +++ b/python/samples/02-agents/providers/mlflow_gateway/mlflow_gateway_with_openai_chat_client.py @@ -0,0 +1,134 @@ +# Copyright (c) Microsoft. All rights reserved. + +import asyncio +import os +import sys +from random import randint +from typing import Annotated + +from agent_framework import Agent, tool +from agent_framework.openai import OpenAIChatClient +from dotenv import load_dotenv + +# Load environment variables from .env file +load_dotenv() + +""" +MLflow AI Gateway with OpenAI Chat Client Example + +This sample demonstrates routing Agent Framework requests through the +MLflow AI Gateway using the OpenAI-compatible passthrough endpoint. + +MLflow AI Gateway (MLflow >= 3.0) is a database-backed LLM proxy that +provides a unified API across multiple providers (OpenAI, Anthropic, +Gemini, Mistral, Bedrock, Ollama, and more) with built-in secrets +management, fallback/retry, traffic splitting, and budget tracking. +Provider API keys are stored encrypted on the server. + +Setup: + pip install mlflow[genai] + mlflow server --host 127.0.0.1 --port 5000 + +Then create a gateway endpoint in the MLflow UI at http://localhost:5000 +under AI Gateway -> Create Endpoint, select a provider and model, and +enter your provider API key. + +Environment Variables: +- MLFLOW_GATEWAY_ENDPOINT: Base URL for the gateway's OpenAI-compatible + endpoint (e.g., "http://localhost:5000/gateway/openai/v1/") +- MLFLOW_GATEWAY_MODEL: The gateway endpoint name you created in the + MLflow UI (e.g., "my-chat-endpoint") + +See: https://mlflow.org/docs/latest/genai/governance/ai-gateway/ +""" + + +def _require_env(name: str) -> str: + """Read a required env var; exit with a clear error if missing or empty. + + Without this check, an empty MLFLOW_GATEWAY_ENDPOINT would cause + OpenAIChatClient to silently fall back to OpenAI's public endpoint and + forward prompts there. + """ + value = os.getenv(name) + if not value: + sys.exit( + f"Error: {name} is not set. See the README in this folder for setup " + "instructions: https://mlflow.org/docs/latest/genai/governance/ai-gateway/" + ) + return value + + +# NOTE: approval_mode="never_require" is for sample brevity. Use "always_require" in production; +# see samples/02-agents/tools/function_tool_with_approval.py +# and samples/02-agents/tools/function_tool_with_approval_and_sessions.py. +@tool(approval_mode="never_require") +def get_weather( + location: Annotated[str, "The location to get the weather for."], +) -> str: + """Get the weather for a given location.""" + conditions = ["sunny", "cloudy", "rainy", "stormy"] + return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C." + + +async def non_streaming_example(base_url: str, model: str) -> None: + """Example of non-streaming response (get the complete result at once).""" + print("=== Non-streaming Response Example ===") + + _client = OpenAIChatClient( + api_key="unused", # Provider keys are managed by the MLflow server + base_url=base_url, + model=model, + ) + agent = Agent( + client=_client, + name="WeatherAgent", + instructions="You are a helpful weather agent.", + tools=[get_weather], + ) + + query = "What's the weather like in Seattle?" + print(f"User: {query}") + result = await agent.run(query) + print(f"Agent: {result}\n") + + +async def streaming_example(base_url: str, model: str) -> None: + """Example of streaming response (get results as they are generated).""" + print("=== Streaming Response Example ===") + + _client = OpenAIChatClient( + api_key="unused", # Provider keys are managed by the MLflow server + base_url=base_url, + model=model, + ) + agent = Agent( + client=_client, + name="WeatherAgent", + instructions="You are a helpful weather agent.", + tools=[get_weather], + ) + + query = "What's the weather like in Portland?" + print(f"User: {query}") + print("Agent: ", end="", flush=True) + async for chunk in agent.run(query, stream=True): + if chunk.text: + print(chunk.text, end="", flush=True) + print("\n") + + +async def main() -> None: + print("=== MLflow AI Gateway with OpenAI Chat Client Agent Example ===") + + # Validate required env vars upfront so we never silently route to OpenAI's + # public endpoint if MLFLOW_GATEWAY_ENDPOINT is missing or empty. + base_url = _require_env("MLFLOW_GATEWAY_ENDPOINT") + model = _require_env("MLFLOW_GATEWAY_MODEL") + + await non_streaming_example(base_url, model) + await streaming_example(base_url, model) + + +if __name__ == "__main__": + asyncio.run(main())