-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Python: Add MLflow AI Gateway provider samples #5507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # MLflow AI Gateway Examples | ||
|
|
||
| This folder contains examples demonstrating how to use the [MLflow AI Gateway](https://mlflow.org/docs/latest/genai/governance/ai-gateway/) with the Agent Framework. | ||
|
|
||
| ## What is MLflow AI Gateway? | ||
|
|
||
| MLflow AI Gateway (MLflow ≥ 3.0) is a database-backed LLM proxy built into the MLflow tracking server. It provides a unified API across multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more — with built-in: | ||
|
|
||
| - **Secrets management** — provider API keys stored encrypted on the server | ||
| - **Fallback & retry** — automatic failover to backup models on failure | ||
| - **Traffic splitting** — A/B test by routing percentages of requests to different models | ||
| - **Budget tracking** — per-endpoint or per-user token budgets | ||
| - **Usage tracing** — every call logged as an MLflow trace automatically | ||
|
|
||
| All gateway features are configured through the MLflow UI. Your application code stays the same regardless of which underlying LLM provider the gateway routes to. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| 1. **Install MLflow** (using [`uv`](https://docs.astral.sh/uv/), which Agent Framework uses): | ||
|
|
||
| ```bash | ||
| uv pip install 'mlflow[genai]' | ||
| ``` | ||
|
|
||
| Or run it directly with `uvx` (no install needed): | ||
|
|
||
| ```bash | ||
| uvx --from 'mlflow[genai]' mlflow server --host 127.0.0.1 --port 5000 | ||
| ``` | ||
|
|
||
| 2. **Start the MLflow server** (if you didn't use `uvx` above): | ||
|
|
||
| ```bash | ||
| mlflow server --host 127.0.0.1 --port 5000 | ||
| ``` | ||
|
|
||
| 3. **Create a gateway endpoint** in the MLflow UI at [http://localhost:5000](http://localhost:5000). Navigate to **AI Gateway → Create Endpoint**, select a provider (e.g., OpenAI) and model (e.g., `gpt-4o-mini`), and enter your provider API key. The key is stored encrypted on the server. | ||
|
|
||
| See the [MLflow AI Gateway documentation](https://mlflow.org/docs/latest/genai/governance/ai-gateway/endpoints/) for details on endpoint configuration. | ||
|
|
||
| ## Recommended Approach | ||
|
|
||
| Since MLflow AI Gateway exposes an OpenAI-compatible endpoint at `/gateway/openai/v1`, you can connect Agent Framework to it using the existing `OpenAIChatClient` with a custom `base_url` — no extra packages required beyond the OpenAI integration. | ||
|
|
||
| ## Examples | ||
|
|
||
| | File | Description | | ||
| |------|-------------| | ||
| | [`mlflow_gateway_with_openai_chat_client.py`](mlflow_gateway_with_openai_chat_client.py) | Connect an Agent Framework agent to MLflow AI Gateway via the OpenAI-compatible endpoint. Shows both streaming and non-streaming responses with tool calling. | | ||
|
|
||
| ## Configuration | ||
|
|
||
| Set the following environment variables before running the example: | ||
|
|
||
| - `MLFLOW_GATEWAY_ENDPOINT`: The base URL for the gateway's OpenAI-compatible endpoint (must include the `/gateway/openai/v1/` suffix) | ||
| - Example: `export MLFLOW_GATEWAY_ENDPOINT="http://localhost:5000/gateway/openai/v1/"` | ||
|
|
||
| - `MLFLOW_GATEWAY_MODEL`: The gateway endpoint name you created in the MLflow UI | ||
| - Example: `export MLFLOW_GATEWAY_MODEL="my-chat-endpoint"` | ||
|
|
||
| ## Switching Providers Without Code Changes | ||
|
|
||
| A key benefit of using MLflow AI Gateway is that you can change the underlying LLM provider by reconfiguring the gateway endpoint in the MLflow UI — your Agent Framework code stays the same. For example, the same agent can route to: | ||
|
|
||
| - An OpenAI-backed endpoint for production | ||
| - An Anthropic-backed endpoint for fallback | ||
| - A local Ollama-backed endpoint for development | ||
|
|
||
| All controlled by the gateway's endpoint configuration. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # Copyright (c) Microsoft. All rights reserved. | ||
|
|
||
| import asyncio | ||
| import os | ||
| import sys | ||
| from random import randint | ||
| from typing import Annotated | ||
|
|
||
| from agent_framework import Agent, tool | ||
| from agent_framework.openai import OpenAIChatClient | ||
| from dotenv import load_dotenv | ||
|
|
||
| # Load environment variables from .env file | ||
| load_dotenv() | ||
|
|
||
| """ | ||
| MLflow AI Gateway with OpenAI Chat Client Example | ||
|
|
||
| This sample demonstrates routing Agent Framework requests through the | ||
| MLflow AI Gateway using the OpenAI-compatible passthrough endpoint. | ||
|
|
||
| MLflow AI Gateway (MLflow >= 3.0) is a database-backed LLM proxy that | ||
| provides a unified API across multiple providers (OpenAI, Anthropic, | ||
| Gemini, Mistral, Bedrock, Ollama, and more) with built-in secrets | ||
| management, fallback/retry, traffic splitting, and budget tracking. | ||
| Provider API keys are stored encrypted on the server. | ||
|
|
||
| Setup: | ||
| pip install mlflow[genai] | ||
| mlflow server --host 127.0.0.1 --port 5000 | ||
|
|
||
| Then create a gateway endpoint in the MLflow UI at http://localhost:5000 | ||
| under AI Gateway -> Create Endpoint, select a provider and model, and | ||
| enter your provider API key. | ||
|
|
||
| Environment Variables: | ||
| - MLFLOW_GATEWAY_ENDPOINT: Base URL for the gateway's OpenAI-compatible | ||
| endpoint (e.g., "http://localhost:5000/gateway/openai/v1/") | ||
| - MLFLOW_GATEWAY_MODEL: The gateway endpoint name you created in the | ||
| MLflow UI (e.g., "my-chat-endpoint") | ||
|
|
||
| See: https://mlflow.org/docs/latest/genai/governance/ai-gateway/ | ||
| """ | ||
|
|
||
|
|
||
| def _require_env(name: str) -> str: | ||
| """Read a required env var; exit with a clear error if missing or empty. | ||
|
|
||
| Without this check, an empty MLFLOW_GATEWAY_ENDPOINT would cause | ||
| OpenAIChatClient to silently fall back to OpenAI's public endpoint and | ||
| forward prompts there. | ||
| """ | ||
| value = os.getenv(name) | ||
| if not value: | ||
| sys.exit( | ||
| f"Error: {name} is not set. See the README in this folder for setup " | ||
| "instructions: https://mlflow.org/docs/latest/genai/governance/ai-gateway/" | ||
| ) | ||
| return value | ||
|
|
||
|
|
||
| # NOTE: approval_mode="never_require" is for sample brevity. Use "always_require" in production; | ||
| # see samples/02-agents/tools/function_tool_with_approval.py | ||
| # and samples/02-agents/tools/function_tool_with_approval_and_sessions.py. | ||
| @tool(approval_mode="never_require") | ||
| def get_weather( | ||
| location: Annotated[str, "The location to get the weather for."], | ||
| ) -> str: | ||
| """Get the weather for a given location.""" | ||
| conditions = ["sunny", "cloudy", "rainy", "stormy"] | ||
| return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C." | ||
|
|
||
|
|
||
| async def non_streaming_example(base_url: str, model: str) -> None: | ||
| """Example of non-streaming response (get the complete result at once).""" | ||
| print("=== Non-streaming Response Example ===") | ||
|
|
||
| _client = OpenAIChatClient( | ||
| api_key="unused", # Provider keys are managed by the MLflow server | ||
| base_url=base_url, | ||
| model=model, | ||
| ) | ||
|
Comment on lines
+78
to
+82
|
||
| agent = Agent( | ||
| client=_client, | ||
| name="WeatherAgent", | ||
| instructions="You are a helpful weather agent.", | ||
| tools=[get_weather], | ||
| ) | ||
|
|
||
| query = "What's the weather like in Seattle?" | ||
| print(f"User: {query}") | ||
| result = await agent.run(query) | ||
| print(f"Agent: {result}\n") | ||
|
|
||
|
|
||
| async def streaming_example(base_url: str, model: str) -> None: | ||
| """Example of streaming response (get results as they are generated).""" | ||
| print("=== Streaming Response Example ===") | ||
|
|
||
| _client = OpenAIChatClient( | ||
| api_key="unused", # Provider keys are managed by the MLflow server | ||
| base_url=base_url, | ||
| model=model, | ||
| ) | ||
|
Comment on lines
+100
to
+104
|
||
| agent = Agent( | ||
| client=_client, | ||
| name="WeatherAgent", | ||
| instructions="You are a helpful weather agent.", | ||
| tools=[get_weather], | ||
| ) | ||
|
|
||
| query = "What's the weather like in Portland?" | ||
| print(f"User: {query}") | ||
| print("Agent: ", end="", flush=True) | ||
| async for chunk in agent.run(query, stream=True): | ||
| if chunk.text: | ||
| print(chunk.text, end="", flush=True) | ||
| print("\n") | ||
|
|
||
|
|
||
| async def main() -> None: | ||
| print("=== MLflow AI Gateway with OpenAI Chat Client Agent Example ===") | ||
|
|
||
| # Validate required env vars upfront so we never silently route to OpenAI's | ||
| # public endpoint if MLFLOW_GATEWAY_ENDPOINT is missing or empty. | ||
| base_url = _require_env("MLFLOW_GATEWAY_ENDPOINT") | ||
| model = _require_env("MLFLOW_GATEWAY_MODEL") | ||
|
|
||
| await non_streaming_example(base_url, model) | ||
| await streaming_example(base_url, model) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples table is malformed markdown (each row starts with
||), so it won’t render as a 2‑column table. Use standard table syntax with a single leading|per row (matching other provider READMEs).