From 4f813cbe73b68a2ad664cebff46efbf16282f127 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Mar 2026 22:41:31 +0000 Subject: [PATCH 1/4] Initial plan From 7fb9ae4560654e8b5c11a138d2d2a6be9d5e6fe1 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Mar 2026 23:11:25 +0000 Subject: [PATCH 2/4] AI freshness pass: update ms.date and content for 13 remaining articles Co-authored-by: gewarren <24882762+gewarren@users.noreply.github.com> --- docs/ai/azure-ai-services-authentication.md | 25 ++++++++++--------- .../conceptual/chain-of-thought-prompting.md | 11 ++++---- docs/ai/conceptual/embeddings.md | 13 +++++----- docs/ai/conceptual/how-genai-and-llms-work.md | 21 ++++++++-------- .../conceptual/prompt-engineering-dotnet.md | 9 ++++--- docs/ai/conceptual/understanding-tokens.md | 23 +++++++++-------- docs/ai/conceptual/vector-databases.md | 15 +++++------ docs/ai/conceptual/zero-shot-learning.md | 17 +++++++------ ...-chat-scaling-with-azure-container-apps.md | 5 ++-- docs/ai/get-started-app-chat-template.md | 25 ++++++++++--------- docs/ai/how-to/app-service-aoai-auth.md | 19 +++++++------- docs/ai/how-to/content-filtering.md | 9 ++++--- .../ai/tutorials/tutorial-ai-vector-search.md | 17 +++++++------ 13 files changed, 110 insertions(+), 99 deletions(-) diff --git a/docs/ai/azure-ai-services-authentication.md b/docs/ai/azure-ai-services-authentication.md index a28f62c45722b..b1eae359480bb 100644 --- a/docs/ai/azure-ai-services-authentication.md +++ b/docs/ai/azure-ai-services-authentication.md @@ -3,12 +3,13 @@ title: Authenticate to Azure OpenAI using .NET description: Learn about the different options to authenticate to Azure OpenAI and other services using .NET. author: alexwolfmsft ms.topic: concept-article -ms.date: 04/09/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted --- # Foundry Tools authentication and authorization using .NET -Application requests to Foundry Tools must be authenticated. In this article, you explore the options available to authenticate to Azure OpenAI and other Foundry Tools using .NET. Most Foundry Tools offer two primary ways to authenticate apps and users: +Foundry Tools require authentication for all application requests. This article covers the options available to authenticate to Azure OpenAI and other Foundry Tools using .NET. Most Foundry Tools offer two primary ways to authenticate apps and users: - **Key-based authentication** provides access to an Azure service using secret key values. These secret values are sometimes known as API keys or access keys depending on the service. - **Microsoft Entra ID** provides a comprehensive identity and access management solution to ensure that the correct identities have the correct level of access to different Azure resources. @@ -34,14 +35,14 @@ builder.Services.AddAzureOpenAIChatCompletion( var kernel = builder.Build(); ``` -Using keys is a straightforward option, but this approach should be used with caution. Keys aren't the recommended authentication option because they: +Keys are straightforward to use, but treat them with caution. Keys aren't the recommended authentication option because they: - Don't follow [the principle of least privilege](/entra/identity-platform/secure-least-privileged-access). They provide elevated permissions regardless of who uses them or for what task. -- Can accidentally be checked into source control or stored in unsafe locations. +- Can accidentally end up in source control or unsafe storage locations. - Can easily be shared with or sent to parties who shouldn't have access. - Often require manual administration and rotation. -Instead, consider using [Microsoft Entra ID](/#explore-microsoft-entra-id) for authentication, which is the recommended solution for most scenarios. +Instead, consider using [Microsoft Entra ID](#authentication-using-microsoft-entra-id) for authentication, which is the recommended solution for most scenarios. ## Authentication using Microsoft Entra ID @@ -49,16 +50,16 @@ Microsoft Entra ID is a cloud-based identity and access management service that - Keyless authentication using [identities](/entra/fundamentals/identity-fundamental-concepts). - Role-based access control (RBAC) to assign identities the minimum required permissions. -- Can use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library to detect [different credentials across environments](/dotnet/api/azure.identity.defaultazurecredential) without requiring code changes. +- Lets you use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library to detect [different credentials across environments](/dotnet/api/azure.identity.defaultazurecredential) without requiring code changes. - Automatically handles administrative maintenance tasks such as rotating underlying keys. The workflow to implement Microsoft Entra authentication in your app generally includes the following steps: - Local development: - 1. Sign-in to Azure using a local dev tool such as the Azure CLI or Visual Studio. + 1. Sign in to Azure using a local dev tool such as the Azure CLI or Visual Studio. 1. Configure your code to use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library and `DefaultAzureCredential` class. - 1. Assign Azure roles to the account you signed-in with to enable access to the Foundry Tool. + 1. Assign Azure roles to the account you signed in with to enable access to the Foundry Tool. - Azure-hosted app: @@ -70,7 +71,7 @@ The key concepts of this workflow are explored in the following sections. ### Authenticate to Azure locally -When developing apps locally that connect to Foundry Tools, authenticate to Azure using a tool such as Visual Studio or the Azure CLI. Your local credentials can be discovered by the `Azure.Identity` client library and used to authenticate your app to Azure services, as described in the [Configure the app code](/#configure-your-app-code) section. +When developing apps locally that connect to Foundry Tools, authenticate to Azure using a tool such as Visual Studio or the Azure CLI. Your local credentials can be discovered by the `Azure.Identity` client library and used to authenticate your app to Azure services, as described in the [Configure the app code](#configure-the-app-code) section. For example, to authenticate to Azure locally using the Azure CLI, run the following command: @@ -80,7 +81,7 @@ az login ### Configure the app code -Use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library from the Azure SDK to implement Microsoft Entra authentication in your code. The `Azure.Identity` libraries include the `DefaultAzureCredential` class, which automatically discovers available Azure credentials based on the current environment and tooling available. For the full set of supported environment credentials and the order in which they are searched, see the [Azure SDK for .NET](/dotnet/api/azure.identity.defaultazurecredential) documentation. +Use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library from the Azure SDK to implement Microsoft Entra authentication in your code. The `Azure.Identity` libraries include the `DefaultAzureCredential` class, which automatically discovers available Azure credentials based on the current environment and tooling available. For the full set of supported environment credentials and the order in which `DefaultAzureCredential` searches them, see the [Azure SDK for .NET](/dotnet/api/azure.identity.defaultazurecredential) documentation. For example, configure Azure OpenAI to authenticate using `DefaultAzureCredential` using the following code: @@ -94,7 +95,7 @@ AzureOpenAIClient azureClient = ); ``` -`DefaultAzureCredential` enables apps to be promoted from local development to production without code changes. For example, during development `DefaultAzureCredential` uses your local user credentials from Visual Studio or the Azure CLI to authenticate to the Foundry Tool. When the app is deployed to Azure, `DefaultAzureCredential` uses the managed identity that is assigned to your app. +`DefaultAzureCredential` enables apps to be promoted from local development to production without code changes. For example, during development `DefaultAzureCredential` uses your local user credentials from Visual Studio or the Azure CLI to authenticate to the Foundry Tool. When you deploy the app to Azure, `DefaultAzureCredential` uses the managed identity assigned to your app. ### Assign roles to your identity @@ -125,7 +126,7 @@ There are two types of managed identities you can assign to your app: - A **system-assigned identity** is tied to your application and is deleted if your app is deleted. An app can only have one system-assigned identity. - A **user-assigned identity** is a standalone Azure resource that can be assigned to your app. An app can have multiple user-assigned identities. -Assign roles to a managed identity just like you would an individual user account, such as the **Cognitive Services OpenAI User** role. learn more about working with managed identities using the following resources: +Assign roles to a managed identity just like you would an individual user account, such as the **Cognitive Services OpenAI User** role. Learn more about working with managed identities using the following resources: - [Managed identities overview](/entra/identity/managed-identities-azure-resources/overview) - [Authenticate App Service to Azure OpenAI using Microsoft Entra ID](/dotnet/ai/how-to/app-service-aoai-auth?pivots=azure-portal) diff --git a/docs/ai/conceptual/chain-of-thought-prompting.md b/docs/ai/conceptual/chain-of-thought-prompting.md index 962675228c0f7..a87e028d551c0 100644 --- a/docs/ai/conceptual/chain-of-thought-prompting.md +++ b/docs/ai/conceptual/chain-of-thought-prompting.md @@ -1,8 +1,9 @@ --- -title: "Chain of Thought Prompting - .NET" +title: "Chain of thought prompting - .NET" description: "Learn how chain of thought prompting can simplify prompt engineering." ms.topic: concept-article #Don't change. -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want to understand what chain-of-thought prompting is and how it can help me save time and get better completions out of prompt engineering. @@ -10,11 +11,11 @@ ms.date: 05/29/2025 # Chain of thought prompting -GPT model performance and response quality benefits from *prompt engineering*, which is the practice of providing instructions and examples to a model to prime or refine its output. As they process instructions, models make more reasoning errors when they try to answer right away rather than taking time to work out an answer. You can help the model reason its way toward correct answers more reliably by asking for the model to include its chain of thought—that is, the steps it took to follow an instruction, along with the results of each step. +GPT model performance and response quality benefit from *prompt engineering*, which is the practice of providing instructions and examples to a model to prime or refine its output. As they process instructions, models make more reasoning errors when they try to answer right away rather than taking time to work out an answer. Help the model reason its way toward correct answers more reliably by asking the model to include its chain of thought—that is, the steps it took to follow an instruction, along with the results of each step. *Chain of thought prompting* is the practice of prompting a model to perform a task step-by-step and to present each step and its result in order in the output. This simplifies prompt engineering by offloading some execution planning to the model, and makes it easier to connect any problem to a specific step so you know where to focus further efforts. -It's generally simpler to just instruct the model to include its chain of thought, but you can use examples to show the model how to break down tasks. The following sections show both ways. +It's generally simpler to instruct the model to include its chain of thought, but you can also use examples to show the model how to break down tasks. The following sections show both ways. ## Use chain of thought prompting in instructions @@ -27,7 +28,7 @@ Break the task into steps, and output the result of each step as you perform it. ## Use chain of thought prompting in examples -You can use examples to indicate the steps for chain of thought prompting, which the model will interpret to mean it should also output step results. Steps can include formatting cues. +Use examples to indicate the steps for chain of thought prompting, which the model interprets to mean it should also output step results. Steps can include formatting cues. ```csharp prompt= """ diff --git a/docs/ai/conceptual/embeddings.md b/docs/ai/conceptual/embeddings.md index 4f1f0f206b0be..19fd89971d1e8 100644 --- a/docs/ai/conceptual/embeddings.md +++ b/docs/ai/conceptual/embeddings.md @@ -2,19 +2,18 @@ title: "How Embeddings Extend Your AI Model's Reach" description: "Learn how embeddings extend the limits and capabilities of AI models in .NET." ms.topic: concept-article #Don't change. -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want to understand how embeddings extend LLM limits and capabilities in .NET so that I have more semantic context and better outcomes for my AI apps. --- # Embeddings in .NET -Embeddings are the way LLMs capture semantic meaning. They are numeric representations of non-numeric data that an LLM can use to determine relationships between concepts. You can use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text or creating images from text descriptions. LLMs can use embeddings immediately, and you can store embeddings in vector databases to provide semantic memory for LLMs as-needed. +Embeddings are the way LLMs capture semantic meaning. They're numeric representations of non-numeric data that an LLM can use to determine relationships between concepts. Use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text or creating images from text descriptions. LLMs can use embeddings immediately, and you can store embeddings in vector databases to provide semantic memory for LLMs as needed. ## Use cases for embeddings -This section lists the main use cases for embeddings. - ### Use your own data to improve completion relevance Use your own databases to generate embeddings for your data and integrate it with an LLM to make it available for completions. This use of embeddings is an important component of [retrieval-augmented generation](rag.md). @@ -23,7 +22,7 @@ Use your own databases to generate embeddings for your data and integrate it wit Use embeddings to increase the amount of context you can fit in a prompt without increasing the number of tokens required. -For example, suppose you want to include 500 pages of text in a prompt. The number of tokens for that much raw text will exceed the input token limit, making it impossible to directly include in a prompt. You can use embeddings to summarize and break down large amounts of that text into pieces that are small enough to fit in one input, and then assess the similarity of each piece to the entire raw text. Then you can choose a piece that best preserves the semantic meaning of the raw text and use it in your prompt without hitting the token limit. +For example, suppose you want to include 500 pages of text in a prompt. The number of tokens for that much raw text exceeds the input token limit, making it impossible to directly include in a prompt. You can use embeddings to summarize and break down large amounts of that text into pieces that are small enough to fit in one input, and then assess the similarity of each piece to the entire raw text. Then you can choose a piece that best preserves the semantic meaning of the raw text and use it in your prompt without hitting the token limit. ### Perform text classification, summarization, or translation @@ -45,11 +44,11 @@ Use embeddings to help a model create code from text or vice versa, by convertin ## Choose an embedding model -You generate embeddings for your raw data by using an AI embedding model, which can encode non-numeric data into a vector (a long array of numbers). The model can also decode an embedding into non-numeric data that has the same or similar meaning as the original, raw data. There are many embedding models available for you to use, with OpenAI's `text-embedding-ada-002` model being one of the common models that's used. For more examples, see the list of [Embedding models available on Azure OpenAI](/azure/ai-services/openai/concepts/models#embeddings). +You generate embeddings for your raw data by using an AI embedding model, which can encode non-numeric data into a vector (a long array of numbers). The model can also decode an embedding into non-numeric data that has the same or similar meaning as the original, raw data. OpenAI's `text-embedding-3-small` and `text-embedding-3-large` are the currently recommended embedding models, replacing the older `text-embedding-ada-002`. For more examples, see the list of [Embedding models available on Azure OpenAI](/azure/ai-services/openai/concepts/models#embeddings). ### Store and process embeddings in a vector database -After you generate embeddings, you'll need a way to store them so you can later retrieve them with calls to an LLM. Vector databases are designed to store and process vectors, so they're a natural home for embeddings. Different vector databases offer different processing capabilities, so you should choose one based on your raw data and your goals. For information about your options, see [Vector databases for .NET + AI](vector-databases.md). +After you generate embeddings, you need a way to store them so you can later retrieve them with calls to an LLM. Vector databases are designed to store and process vectors, so they're a natural home for embeddings. Different vector databases offer different processing capabilities. Choose one based on your raw data and your goals. For information about your options, see [Vector databases for .NET + AI](vector-databases.md). ### Using embeddings in your LLM solution diff --git a/docs/ai/conceptual/how-genai-and-llms-work.md b/docs/ai/conceptual/how-genai-and-llms-work.md index 07836925dd593..09e080a5fe71b 100644 --- a/docs/ai/conceptual/how-genai-and-llms-work.md +++ b/docs/ai/conceptual/how-genai-and-llms-work.md @@ -2,7 +2,8 @@ title: "How Generative AI and LLMs work" description: "Understand how Generative AI and large language models (LLMs) work and how they might be useful in your .NET projects." ms.topic: concept-article -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want to understand how Generative AI and large language models (LLMs) work and how they may be useful in my .NET projects. @@ -10,19 +11,19 @@ ms.date: 05/29/2025 # How generative AI and LLMs work -Generative AI is a type of artificial intelligence capable of creating original content, such as natural language, images, audio, and code. The output of a generative AI is based on the inputs provided by the user. One common way for users to interact with generative AI is through chat applications that use natural language as their input. ChatGPT, developed by OpenAI, is a popular example of this. Generative AI applications that use natural language as an input are powered by large language models (LLMs) to perform natural language processing (NLP). +Generative AI is a type of artificial intelligence that can create original content, such as natural language, images, audio, and code. The output depends on the inputs you provide. Users commonly interact with generative AI through chat applications that use natural language as input. ChatGPT, developed by OpenAI, is a popular example. Generative AI applications that use natural language as input are powered by large language models (LLMs) to perform natural language processing (NLP). ## How generative AI works -All generative AI is built on top of models. These models are trained with large sets of data in the form of content, such as natural language, images, audio, and code. Generative AI models use the patterns identified in the training data to produce new, statistically similar content. +All generative AI is built on models. These models are trained with large sets of data in the form of content, such as natural language, images, audio, and code. Generative AI models use the patterns identified in the training data to produce new, statistically similar content. -The input provided by the user is used by the AI model to build an output. The input is first parsed into a form of data that the model can understand. The model then uses that data to identify matching patterns from its training that it combines to build the final output. Generative AI models are designed to produce unique content, so they won't generate the same output for identical inputs. +The AI model uses your input to build an output. The model first parses the input into a form it can understand. The model then uses that data to identify matching patterns from its training that it combines to build the final output. Generative AI models are designed to produce unique content, so they won't generate the same output for identical inputs. -Generative AI applications that support natural language as an input or output utilize LLMs to do so. The LLM is used to perform NLP, which classifies the input text and determines its sentiment. That classification and sentiment analysis is used by the generative AI model to identify patterns and build the output. If the output is text, the LLM alone can be used to generate it. If the output is audio or images, additional models are used to provide the data and patterns for generating outputs in that format. +Generative AI applications that support natural language as input or output use LLMs to do so. The LLM performs NLP, which classifies the input text and determines its sentiment. The generative AI model uses that classification and sentiment analysis to identify patterns and build the output. If the output is text, the LLM alone generates it. If the output is audio or images, additional models provide the data and patterns for generating outputs in that format. ## Common uses of generative AI -Generative AI applications support a variety of potential use cases and potential outputs, which are explored in the following sections. +Generative AI applications support a variety of use cases and outputs, described in the following sections. ### Natural language generation @@ -50,7 +51,7 @@ Some generative AI applications produce image outputs from natural language inpu - The artistic style to create the image in - References for generating similar images -Image generation can create virtual avatars for online accounts, design logos for a business, or provide artistic inspiration for creators. For example, a user may input the request, *Create an image of an elephant eating a burger*. A generative AI application might produce the following output: +Image generation can create virtual avatars for online accounts, design logos for a business, or provide artistic inspiration for creators. For example, a user might input the request, *Create an image of an elephant eating a burger*. A generative AI application might produce the following output: :::image type="content" source="../media/how-genai-and-llms-work/generated-image.png" lightbox="../media/how-genai-and-llms-work/generated-image.png" alt-text="Example AI generated image of an elephant eating a hamburger."::: @@ -60,7 +61,7 @@ Some generative AI applications produce audio outputs from natural language inpu - Synthesize natural sounding voices from input text - Create music in a specific style or featuring certain instruments -- Modify input audio files based on a set criteria provided in natural language +- Modify input audio files based on set criteria provided in natural language Audio generation can provide spoken responses from digital voice assistants, add backing instruments to songs for music production, or reproduce a user's original voice from reference recordings. @@ -115,11 +116,11 @@ When training an LLM, the training text is first broken down into [tokens](under After the text has been broken down into tokens, a contextual vector, known as an [embedding](embeddings.md), is assigned to each token. These embedding vectors are multi-valued numeric data where each element of a token's vector represents a semantic attribute of the token. The elements of a token's vector are determined based on how commonly tokens are used together or in similar contexts. -The goal is to predict the next token in the sequence based on the preceding tokens. A weight is assigned to each token in the existing sequence that represents its relative influence on the next token. A calculation is then performed that uses the preceding tokens' weights and embeddings to predict the next vector value. The model then selects the most probable token to continue the sequence based on the predicted vector. +The goal is to predict the next token in the sequence based on the preceding tokens. The model assigns a weight to each token in the existing sequence, representing its relative influence on the next token. The model then uses the preceding tokens' weights and embeddings to calculate and predict the next vector value. The model then selects the most probable token to continue the sequence based on the predicted vector. This process continues iteratively for each token in the sequence, with the output sequence being used regressively as the input for the next iteration. The output is built one token at a time. This strategy is analogous to how auto-complete works, where suggestions are based on what's been typed so far and updated with each new input. -During training, the complete sequence of tokens is known, but all tokens that come after the one currently being considered are ignored. The predicted value for the next token's vector is compared to the actual value and the loss is calculated. The weights are then incrementally adjusted to reduce the loss and improve the model. +During training, the model knows the complete token sequence but ignores all tokens after the one currently being considered. The model compares the predicted vector value to the actual value and calculates the loss. Training then incrementally adjusts the weights to reduce the loss and improve the model. ## Related content diff --git a/docs/ai/conceptual/prompt-engineering-dotnet.md b/docs/ai/conceptual/prompt-engineering-dotnet.md index d56e361853572..0c68e6a490c17 100644 --- a/docs/ai/conceptual/prompt-engineering-dotnet.md +++ b/docs/ai/conceptual/prompt-engineering-dotnet.md @@ -2,7 +2,8 @@ title: Prompt engineering concepts description: Learn basic prompt engineering concepts and how to implement them using .NET tools such as Microsoft Agent Framework. ms.topic: concept-article -ms.date: 02/05/2026 +ms.date: 03/04/2026 +ai-usage: ai-assisted --- # Prompt engineering in .NET @@ -30,18 +31,18 @@ An example is text that shows the model how to respond by providing sample user An example starts with a prompt and can optionally include a completion. A completion in an example doesn't have to include the verbatim response—it might just contain a formatted word, the first bullet in an unordered list, or something similar to indicate how each completion should start. -Examples are classified as [zero-shot learning](zero-shot-learning.md#zero-shot-learning) or [few-shot learning](zero-shot-learning.md#few-shot-learning) based on whether they contain verbatim completions. +Classify examples as [zero-shot learning](zero-shot-learning.md#zero-shot-learning) or [few-shot learning](zero-shot-learning.md#few-shot-learning) based on whether they contain verbatim completions. - **Zero-shot learning** examples include a prompt with no verbatim completion. This approach tests a model's responses without giving it example data output. Zero-shot prompts can have completions that include cues, such as indicating the model should output an ordered list by including **"1."** as the completion. - **Few-shot learning** examples include several pairs of prompts with verbatim completions. Few-shot learning can change the model's behavior by adding to its existing knowledge. ## Cues -A *cue* is text that conveys the desired structure or format of output. Like an instruction, a cue isn't processed by the model as if it were user input. Like an example, a cue shows the model what you want instead of telling it what to do. You can add as many cues as you want, so you can iterate to get the result you want. Cues are used with an instruction or an example and should be at the end of the prompt. +A *cue* is text that conveys the desired structure or format of output. Like an instruction, a cue isn't processed by the model as if it were user input. Like an example, a cue shows the model what you want instead of telling it what to do. Add as many cues as you want to iterate toward the result you want. Use cues with an instruction or an example, and place them at the end of the prompt. ## Example prompt using .NET -.NET provides various tools to prompt and chat with different AI models. You can use [Agent Framework](/agent-framework/) to connect to a wide variety of AI models and services. Agent Framework includes tools to create agents with system instructions and maintain conversation state across multiple turns. +.NET provides various tools to prompt and chat with different AI models. Use [Agent Framework](/agent-framework/) to connect to a wide variety of AI models and services. Agent Framework includes tools to create agents with system instructions and maintain conversation state across multiple turns. Consider the following code example: diff --git a/docs/ai/conceptual/understanding-tokens.md b/docs/ai/conceptual/understanding-tokens.md index 9d9e76d55af4b..a9c80932f11dd 100644 --- a/docs/ai/conceptual/understanding-tokens.md +++ b/docs/ai/conceptual/understanding-tokens.md @@ -2,12 +2,13 @@ title: "Understanding tokens" description: "Understand how large language models (LLMs) use tokens to analyze semantic relationships and generate natural language outputs" ms.topic: concept-article -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want understand how large language models (LLMs) use tokens so I can add semantic analysis and text generation capabilities to my .NET projects. --- # Understand tokens -Tokens are words, character sets, or combinations of words and punctuation that are generated by large language models (LLMs) when they decompose text. Tokenization is the first step in training. The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence. +Large language models (LLMs) generate tokens—words, character sets, or combinations of words and punctuation—by decomposing text. Tokenization is the first step in training. The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence. ## Turn text into tokens @@ -41,18 +42,18 @@ The specific tokenization method varies by LLM. Common tokenization methods incl For example, the GPT models, developed by OpenAI, use a type of subword tokenization that's known as _Byte-Pair Encoding_ (BPE). OpenAI provides [a tool to visualize how text will be tokenized](https://platform.openai.com/tokenizer). -There are benefits and disadvantages to each tokenization method: +Each tokenization method has benefits and disadvantages: | Token size | Pros | Cons | |----------------------------------------------------|------|------| | Smaller tokens (character or subword tokenization) | - Enables the model to handle a wider range of inputs, such as unknown words, typos, or complex syntax.
- Might allow the vocabulary size to be reduced, requiring fewer memory resources. | - A given text is broken into more tokens, requiring additional computational resources while processing.
- Given a fixed token limit, the maximum size of the model's input and output is smaller. | -| Larger tokens (word tokenization) | - A given text is broken into fewer tokens, requiring fewer computational resources while processing.
- Given the same token limit, the maximum size of the model's input and output is larger. | - Might cause an increased vocabulary size, requiring more memory resources.
- Can limit the models ability to handle unknown words, typos, or complex syntax. | +| Larger tokens (word tokenization) | - A given text is broken into fewer tokens, requiring fewer computational resources while processing.
- Given the same token limit, the maximum size of the model's input and output is larger. | - Might cause an increased vocabulary size, requiring more memory resources.
- Can limit the model's ability to handle unknown words, typos, or complex syntax. | ## How LLMs use tokens After the LLM completes tokenization, it assigns an ID to each unique token. -Consider our example sentence: +Consider this example sentence: > `I heard a dog bark loudly at a cat` @@ -70,14 +71,14 @@ After the model uses a word tokenization method, it could assign token IDs as fo By assigning IDs, text can be represented as a sequence of token IDs. The example sentence would be represented as [1, 2, 3, 4, 5, 6, 7, 3, 8]. The sentence "`I heard a cat`" would be represented as [1, 2, 3, 8]. -As training continues, the model adds any new tokens in the training text to its vocabulary and assigns it an ID. For example: +As training continues, the model adds any new tokens in the training text to its vocabulary and assigns each one an ID. For example: - `meow` (9) - `run` (10) -The semantic relationships between the tokens can be analyzed by using these token ID sequences. Multi-valued numeric vectors, known as [embeddings](embeddings.md), are used to represent these relationships. An embedding is assigned to each token based on how commonly it's used together with, or in similar contexts to, the other tokens. +These token ID sequences reveal the semantic relationships between tokens. Multi-valued numeric vectors, known as [embeddings](embeddings.md), represent these relationships. The model assigns an embedding to each token based on how commonly it's used together with, or in similar contexts to, the other tokens. -After it's trained, a model can calculate an embedding for text that contains multiple tokens. The model tokenizes the text, then calculates an overall embeddings value based on the learned embeddings of the individual tokens. This technique can be used for semantic document searches or adding vector stores to an AI. +After it's trained, a model can calculate an embedding for text that contains multiple tokens. The model tokenizes the text, then calculates an overall embeddings value based on the learned embeddings of the individual tokens. Use this technique for semantic document searches or to add vector stores to an AI. During output generation, the model predicts a vector value for the next token in the sequence. The model then selects the next token from its vocabulary based on this vector value. In practice, the model calculates multiple vectors by using various elements of the previous tokens' embeddings. The model then evaluates all potential tokens from these vectors and selects the most probable one to continue the sequence. @@ -85,7 +86,7 @@ Output generation is an iterative operation. The model appends the predicted tok ### Token limits -LLMs have limitations regarding the maximum number of tokens that can be used as input or generated as output. This limitation often causes the input and output tokens to be combined into a maximum context window. Taken together, a model's token limit and tokenization method determine the maximum length of text that can be provided as input or generated as output. +LLMs have a maximum number of tokens for input and output. This limit is often expressed as a combined maximum _context window_ that covers both input and output tokens together. Taken together, a model's token limit and tokenization method determine the maximum length of text that can be provided as input or generated as output. For example, consider a model that has a maximum context window of 100 tokens. The model processes the example sentences as input text: @@ -97,9 +98,9 @@ By using a character-based tokenization method, the input is 34 tokens (includin ### Token-based pricing and rate limiting -Generative AI services often use token-based pricing. The cost of each request depends on the number of input and output tokens. The pricing might differ between input and output. For example, see [Azure OpenAI Service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). +Generative AI services often use token-based pricing. The cost of each request depends on the number of input and output tokens. Pricing might differ between input and output. For example, see [Azure OpenAI Service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). -Generative AI services might also be limited regarding the maximum number of tokens per minute (TPM). These rate limits can vary depending on the service region and LLM. For more information about specific regions, see [Azure OpenAI Service quotas and limits](/azure/ai-services/openai/quotas-limits#regional-quota-limits). +Generative AI services also enforce a maximum number of tokens per minute (TPM). These rate limits can vary depending on the service region and LLM. For more information about specific regions, see [Azure OpenAI Service quotas and limits](/azure/ai-services/openai/quotas-limits#regional-quota-limits). ## Related content diff --git a/docs/ai/conceptual/vector-databases.md b/docs/ai/conceptual/vector-databases.md index 2373efce66156..832945838decd 100644 --- a/docs/ai/conceptual/vector-databases.md +++ b/docs/ai/conceptual/vector-databases.md @@ -2,12 +2,13 @@ title: "Using Vector Databases to Extend LLM Capabilities" description: "Learn how vector databases extend LLM capabilities by storing and processing embeddings in .NET." ms.topic: concept-article -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted --- # Vector databases for .NET + AI -Vector databases are designed to store and manage vector [embeddings](embeddings.md). Embeddings are numeric representations of non-numeric data that preserve semantic meaning. Words, documents, images, audio, and other types of data can all be vectorized. You can use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text, finding contextually related data, or creating images from text descriptions. +Vector databases are designed to store and manage vector [embeddings](embeddings.md). Embeddings are numeric representations of non-numeric data that preserve semantic meaning. You can vectorize words, documents, images, audio, and other data types. Use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text, finding contextually related data, or creating images from text descriptions. For example, you can use a vector database to: @@ -19,26 +20,26 @@ For example, you can use a vector database to: ## Understand vector search -Vector databases provide vector search capabilities to find similar items based on their data characteristics rather than by exact matches on a property field. Vector search works by analyzing the vector representations of your data that you created using an AI embedding model such the [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models). The search process measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically. +Vector databases provide vector search capabilities to find similar items based on their data characteristics rather than by exact matches on a property field. Vector search works by analyzing the vector representations of your data that you created using an AI embedding model such as the [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models). The search process measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically. Some services such as [Azure Cosmos DB for MongoDB vCore](/azure/cosmos-db/mongodb/vcore/vector-search) provide native vector search capabilities for your data. Other databases can be enhanced with vector search by indexing the stored data using a service such as Azure AI Search, which can scan and index your data to provide vector search capabilities. ## Vector search workflows with .NET and OpenAI -Vector databases and their search features are especially useful in [RAG pattern](rag.md) workflows with Azure OpenAI. This pattern allows you to augment or enhance your AI model with additional semantically rich knowledge of your data. A common AI workflow using vector databases might include the following steps: +Vector databases and their search features are especially useful in [RAG pattern](rag.md) workflows with Azure OpenAI. This pattern lets you augment your AI model with additional semantically rich knowledge of your data. A common AI workflow using vector databases includes these steps: 1. Create embeddings for your data using an OpenAI embedding model. 1. Store and index the embeddings in a vector database or search service. 1. Convert user prompts from your application to embeddings. -1. Run a vector search across your data, comparing the user prompt embedding to the embeddings your database. -1. Use a language model such as GPT-35 or GPT-4 to assemble a user friendly completion from the vector search results. +1. Run a vector search across your data, comparing the user prompt embedding to the embeddings in your database. +1. Use a language model such as GPT-4o to assemble a user-friendly completion from the vector search results. Visit the [Implement Azure OpenAI with RAG using vector search in a .NET app](../tutorials/tutorial-ai-vector-search.md) tutorial for a hands-on example of this flow. Other benefits of the RAG pattern include: - Generate contextually relevant and accurate responses to user prompts from AI models. -- Overcome LLM tokens limits - the heavy lifting is done through the database vector search. +- Overcome LLM token limits—the database vector search does the heavy lifting. - Reduce the costs from frequent fine-tuning on updated data. ## Related content diff --git a/docs/ai/conceptual/zero-shot-learning.md b/docs/ai/conceptual/zero-shot-learning.md index 774dad0b84a50..55c10fd1d3d28 100644 --- a/docs/ai/conceptual/zero-shot-learning.md +++ b/docs/ai/conceptual/zero-shot-learning.md @@ -2,7 +2,8 @@ title: "Zero-shot and few-shot learning" description: "Learn the use cases for zero-shot and few-shot learning in prompt engineering." ms.topic: concept-article #Don't change. -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want to understand how zero-shot and few-shot learning techniques can help me improve my prompt engineering. @@ -30,14 +31,14 @@ Intent: """; ``` -There are two primary use cases for zero-shot learning: +Zero-shot learning has two primary use cases: -- **Work with fined-tuned LLMs** - Because it relies on the model's existing knowledge, zero-shot learning is not as resource-intensive as few-shot learning, and it works well with LLMs that have already been fine-tuned on instruction datasets. You might be able to rely solely on zero-shot learning and keep costs relatively low. -- **Establish performance baselines** - Zero-shot learning can help you simulate how your app would perform for actual users. This lets you evaluate various aspects of your model's current performance, such as accuracy or precision. In this case, you typically use zero-shot learning to establish a performance baseline and then experiment with few-shot learning to improve performance. +- **Work with fine-tuned LLMs** - Because it relies on the model's existing knowledge, zero-shot learning isn't as resource-intensive as few-shot learning, and it works well with LLMs that have already been fine-tuned on instruction datasets. You might be able to rely solely on zero-shot learning and keep costs relatively low. +- **Establish performance baselines** - Zero-shot learning can help you simulate how your app performs for actual users. This lets you evaluate various aspects of your model's current performance, such as accuracy or precision. In this case, you typically use zero-shot learning to establish a performance baseline and then experiment with few-shot learning to improve performance. ## Few-shot learning -Few-shot learning is the practice of passing prompts paired with verbatim completions (few-shot prompts) to show your model how to respond. Compared to zero-shot learning, this means few-shot learning produces more tokens and causes the model to update its knowledge, which can make few-shot learning more resource-intensive. However, few-shot learning also helps the model produce more relevant responses. +Few-shot learning is the practice of passing prompts paired with verbatim completions (few-shot prompts) to show your model how to respond. Compared to zero-shot learning, this means few-shot learning produces more tokens and causes the model to update its knowledge, which can make few-shot learning more resource-intensive. However, few-shot learning also helps the model produce more relevant responses. ```csharp prompt = $""" @@ -58,14 +59,14 @@ Intent: Few-shot learning has two primary use cases: -- **Tuning an LLM** - Because it can add to the model's knowledge, few-shot learning can improve a model's performance. It also causes the model to create more tokens than zero-shot learning does, which can eventually become prohibitively expensive or even infeasible. However, if your LLM isn't fined-tuned yet, you won't always get good performance with zero-shot prompts, and few-shot learning is warranted. +- **Tuning an LLM** - Because it can add to the model's knowledge, few-shot learning can improve a model's performance. It also causes the model to create more tokens than zero-shot learning does, which can eventually become prohibitively expensive or even infeasible. However, if your LLM isn't fine-tuned yet, you won't always get good performance with zero-shot prompts, and few-shot learning is warranted. - **Fixing performance issues** - You can use few-shot learning as a follow-up to zero-shot learning. In this case, you use zero-shot learning to establish a performance baseline, and then experiment with few-shot learning based on the zero-shot prompts you used. This lets you add to the model's knowledge after seeing how it currently responds, so you can iterate and improve performance while minimizing the number of tokens you introduce. ### Caveats - Example-based learning doesn't work well for complex reasoning tasks. However, adding instructions can help address this. -- Few-shot learning requires creating lengthy prompts. Prompts with large number of tokens can increase computation and latency. This typically means increased costs. There's also a limit to the length of the prompts. -- When you use several examples the model can learn false patterns, such as "Sentiments are twice as likely to be positive than negative." +- Few-shot learning requires creating lengthy prompts. Prompts with a large number of tokens can increase computation and latency. This typically means increased costs. There's also a limit to the length of the prompts. +- When you use several examples, the model can learn false patterns, such as "Sentiments are twice as likely to be positive than negative." ## Related content diff --git a/docs/ai/get-started-app-chat-scaling-with-azure-container-apps.md b/docs/ai/get-started-app-chat-scaling-with-azure-container-apps.md index 586277b7518c1..6e614353b1995 100644 --- a/docs/ai/get-started-app-chat-scaling-with-azure-container-apps.md +++ b/docs/ai/get-started-app-chat-scaling-with-azure-container-apps.md @@ -1,7 +1,8 @@ --- title: Scale Azure OpenAI for .NET chat sample using RAG description: Learn how to add load balancing to your application to extend the chat app beyond the Azure OpenAI token and model quota limits. -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted ms.topic: get-started # CustomerIntent: As a .NET developer new to Azure OpenAI, I want to scale my Azure OpenAI capacity to avoid rate limit errors with Azure Container Apps. --- @@ -18,7 +19,7 @@ ms.topic: get-started #### [Codespaces (recommended)](#tab/github-codespaces) -* Only a [GitHub account](https://www.github.com/login) is required to use CodeSpaces +* You only need a [GitHub account](https://www.github.com/login) to use Codespaces. #### [Visual Studio Code](#tab/visual-studio-code) diff --git a/docs/ai/get-started-app-chat-template.md b/docs/ai/get-started-app-chat-template.md index 4c2996d856587..f5b9cf533f782 100644 --- a/docs/ai/get-started-app-chat-template.md +++ b/docs/ai/get-started-app-chat-template.md @@ -1,8 +1,9 @@ --- title: "Get started with the 'chat using your own data sample' for .NET" description: Get started with .NET and search across your own data using a chat app sample implemented using Azure OpenAI Service and Retrieval Augmented Generation (RAG) in Azure AI Search. Easily deploy with Azure Developer CLI. This article uses the Azure AI Reference Template sample. -ms.date: 05/28/2025 +ms.date: 03/04/2026 ms.topic: get-started +ai-usage: ai-assisted # CustomerIntent: As a .NET developer new to Azure OpenAI, I want deploy and use sample code to interact with app infused with my own business data so that learn from the sample code. --- @@ -12,13 +13,13 @@ This article shows you how to deploy and run the [Chat with your own data sample * [Demo video](https://aka.ms/azai/net/video) -By following the instructions in this article, you will: +In this article, you: - Deploy a chat app to Azure. - Get answers about employee benefits. - Change settings to change behavior of responses. -Once you complete this procedure, you can start modifying the new project with your custom code. +Once you complete this procedure, start modifying the new project with your custom code. This article is part of a collection of articles that show you how to build a chat app using Azure OpenAI service and Azure AI Search. @@ -39,11 +40,11 @@ The architecture of the chat app is shown in the following diagram: - **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. - **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: - [**Azure AI Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. - - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Microsoft Agent Framework](/agent-framework/overview/agent-framework-overview) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. + - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/overview/) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. ## Cost -Most resources in this architecture use a basic or consumption pricing tier. Consumption pricing is based on usage, which means you only pay for what you use. To complete this article, there will be a charge, but it will be minimal. When you are done with the article, you can delete the resources to stop incurring charges. +Most resources in this architecture use a basic or consumption pricing tier. Consumption pricing is based on usage, which means you only pay for what you use. To complete this article, there's a charge, but it's minimal. When you're done with the article, delete the resources to stop incurring charges. For more information, see [Azure Samples: Cost in the sample repo](https://github.com/Azure-Samples/azure-search-openai-demo-csharp#cost-estimation). @@ -145,7 +146,7 @@ The sample repository contains all the code and configuration files you need to ### Deploy chat app to Azure > [!IMPORTANT] -> Azure resources created in this section incur immediate costs, primarily from the Azure AI Search resource. These resources may accrue costs even if you interrupt the command before it is fully executed. +> Azure resources created in this section incur immediate costs, primarily from the Azure AI Search resource. These resources may accrue costs even if you interrupt the command before it's fully executed. 1. Run the following Azure Developer CLI command to provision the Azure resources and deploy the source code: @@ -153,12 +154,12 @@ The sample repository contains all the code and configuration files you need to azd up ``` -1. When you're prompted to enter an environment name, keep it short and lowercase. For example, `myenv`. Its used as part of the resource group name. +1. When you're prompted to enter an environment name, keep it short and lowercase. For example, `myenv`. It's used as part of the resource group name. 1. When prompted, select a subscription to create the resources in. 1. When you're prompted to select a location the first time, select a location near you. This location is used for most the resources including hosting. 1. If you're prompted for a location for the OpenAI model, select a location that is near you. If the same location is available as your first location, select that. -1. Wait until app is deployed. It may take up to 20 minutes for the deployment to complete. -1. After the application has been successfully deployed, you see a URL displayed in the terminal. +1. Wait until the app is deployed. The deployment may take up to 20 minutes to complete. +1. After the application deploys successfully, a URL appears in the terminal. 1. Select that URL labeled `Deploying service web` to open the chat application in a browser. :::image type="content" source="./media/get-started-app-chat-template/browser-chat-with-your-data.png" alt-text="Screenshot of chat app in browser showing several suggestions for chat input and the chat text box to enter a question."::: @@ -172,7 +173,7 @@ The chat app is preloaded with employee benefits information from [PDF files](ht :::image type="content" source="./media/get-started-app-chat-template/browser-chat-initial-answer.png" lightbox="./media/get-started-app-chat-template/browser-chat-initial-answer.png" alt-text="Screenshot of chat app's first answer."::: -1. From the answer, select a citation. A pop-up window will open displaying the source of the information. +1. From the answer, select a citation. A pop-up window opens displaying the source of the information. :::image type="content" source="./media/get-started-app-chat-template/browser-chat-initial-answer-citation-highlighted.png" lightbox="./media/get-started-app-chat-template/browser-chat-initial-answer-citation-highlighted.png" alt-text="Screenshot of chat app's first answer with its citation highlighted in a red box."::: @@ -259,14 +260,14 @@ Deleting the GitHub Codespaces environment ensures that you can maximize the amo #### [Visual Studio Code](#tab/visual-studio-code) -You aren't necessarily required to clean up your local environment, but you can stop the running development container and return to running Visual Studio Code in the context of a local workspace. +You don't need to clean up your local environment, but you can stop the running development container and return to running Visual Studio Code locally. 1. Open the **Command Palette**, search for the **Dev Containers** commands, and then select **Dev Containers: Reopen Folder Locally**. :::image type="content" source="./media/get-started-app-chat-template/reopen-local-command-palette.png" alt-text="Screenshot of the Command Palette option to reopen the current folder within your local environment."::: > [!TIP] -> Visual Studio Code will stop the running development container, but the container still exists in Docker in a stopped state. You always have the option to deleting the container instance, container image, and volumes from Docker to free up more space on your local machine. +> Visual Studio Code stops the running development container, but the container still exists in Docker in a stopped state. You can also delete the container instance, container image, and volumes from Docker to free up more space on your local machine. --- diff --git a/docs/ai/how-to/app-service-aoai-auth.md b/docs/ai/how-to/app-service-aoai-auth.md index 04003ef59d00f..f194220b6bc91 100644 --- a/docs/ai/how-to/app-service-aoai-auth.md +++ b/docs/ai/how-to/app-service-aoai-auth.md @@ -4,7 +4,8 @@ description: "Learn how to authenticate your Azure hosted .NET app to an Azure O author: alexwolfmsft ms.author: alexwolf ms.topic: how-to -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted zone_pivot_groups: azure-interface #customer intent: As a .NET developer, I want authenticate and authorize my App Service to Azure OpenAI by using Microsoft Entra so that I can securely use AI in my .NET application. --- @@ -13,7 +14,7 @@ zone_pivot_groups: azure-interface This article demonstrates how to use [Microsoft Entra ID managed identities](/azure/app-service/overview-managed-identity) and the [Microsoft.Extensions.AI library](../microsoft-extensions-ai.md) to authenticate an Azure hosted app to an Azure OpenAI resource. -A managed identity from Microsoft Entra ID allows your app to easily access other Microsoft Entra protected resources such as Azure OpenAI. The identity is managed by the Azure platform and doesn't require you to provision, manage, or rotate any secrets. +A managed identity from Microsoft Entra ID lets your app access other Microsoft Entra protected resources such as Azure OpenAI. Azure manages the identity and doesn't require you to provision, manage, or rotate any secrets. ## Prerequisites @@ -24,7 +25,7 @@ A managed identity from Microsoft Entra ID allows your app to easily access othe ## Add a managed identity to App Service -Managed identities provide an automatically managed identity in Microsoft Entra ID for applications to use when connecting to resources that support Microsoft Entra authentication. Applications can use managed identities to obtain Microsoft Entra tokens without having to manage any credentials. Your application can be assigned two types of identities: +Managed identities provide an automatically managed identity in Microsoft Entra ID for applications to use when connecting to resources that support Microsoft Entra authentication. Applications can use managed identities to obtain Microsoft Entra tokens without having to manage any credentials. You can assign two types of identities to your application: * A **system-assigned identity** is tied to your application and is deleted if your app is deleted. An app can have only one system-assigned identity. * A **user-assigned identity** is a standalone Azure resource that can be assigned to your app. An app can have multiple user-assigned identities. @@ -33,7 +34,7 @@ Managed identities provide an automatically managed identity in Microsoft Entra # [System-assigned](#tab/system-assigned) -1. Navigate to your app's page in the [Azure portal](https://aka.ms/azureportal), and then scroll down to the **Settings** group. +1. Go to your app's page in the [Azure portal](https://aka.ms/azureportal), and then scroll down to the **Settings** group. 1. Select **Identity**. 1. On the **System assigned** tab, toggle *Status* to **On**, and then select **Save**. @@ -96,7 +97,7 @@ az webapp identity assign --name --resource-group :::zone target="docs" pivot="azure-portal" -1. In the [Azure portal](https://aka.ms/azureportal), navigate to the scope that you want to grant **Azure OpenAI** access to. The scope can be a **Management group**, **Subscription**, **Resource group**, or a specific **Azure OpenAI** resource. +1. In the [Azure portal](https://aka.ms/azureportal), go to the scope that you want to grant **Azure OpenAI** access to. The scope can be a **Management group**, **Subscription**, **Resource group**, or a specific **Azure OpenAI** resource. 1. In the left navigation pane, select **Access control (IAM)**. 1. Select **Add**, then select **Add role assignment**. @@ -110,7 +111,7 @@ az webapp identity assign --name --resource-group :::zone target="docs" pivot="azure-cli" -You can use the Azure CLI to assign the Cognitive Services OpenAI User role to your managed identity at varying scopes. +Use the Azure CLI to assign the Cognitive Services OpenAI User role to your managed identity at different scopes. # [Resource](#tab/resource) @@ -163,10 +164,10 @@ az role assignment create --assignee "" \ The preceding packages each handle the following concerns for this scenario: - **[Azure.Identity](https://www.nuget.org/packages/Azure.Identity)**: Provides core functionality to work with Microsoft Entra ID - - **[Azure.AI.OpenAI](https://www.nuget.org/packages/Azure.AI.OpenAI)**: Enables your app to interface with the Azure OpenAI service + - **[Azure.AI.OpenAI](https://www.nuget.org/packages/Azure.AI.OpenAI)**: Lets your app interface with the Azure OpenAI service - **[Microsoft.Extensions.Azure](https://www.nuget.org/packages/Microsoft.Extensions.Azure)**: Provides helper extensions to register services for dependency injection - **[Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI)**: Provides AI abstractions for common AI tasks - - **[Microsoft.Extensions.AI.OpenAI](https://www.nuget.org/packages/Microsoft.Extensions.AI.OpenAI)**: Enables you to use OpenAI service types as AI abstractions provided by **Microsoft.Extensions.AI** + - **[Microsoft.Extensions.AI.OpenAI](https://www.nuget.org/packages/Microsoft.Extensions.AI.OpenAI)**: Lets you use OpenAI service types as AI abstractions provided by **Microsoft.Extensions.AI** 1. In the `Program.cs` file of your app, create a `DefaultAzureCredential` object to discover and configure available credentials: @@ -181,7 +182,7 @@ az role assignment create --assignee "" \ :::code language="csharp" source="./snippets/hosted-app-auth/program.cs" range="41-46"::: > [!TIP] - > Learn more about ASP.NET Core dependency injection and how to register other AI services types in the Azure SDK for .NET [dependency injection](../../azure/sdk/dependency-injection.md) documentation. + > For more information about ASP.NET Core dependency injection and registering other AI service types, see the Azure SDK for .NET [dependency injection](../../azure/sdk/dependency-injection.md) documentation. ## Related content diff --git a/docs/ai/how-to/content-filtering.md b/docs/ai/how-to/content-filtering.md index 6eb65ca48f9a3..e8db5e48887d3 100644 --- a/docs/ai/how-to/content-filtering.md +++ b/docs/ai/how-to/content-filtering.md @@ -1,8 +1,9 @@ --- -title: "Manage OpenAI Content Filtering in a .NET app" +title: "Manage OpenAI content filtering in a .NET app" description: "Learn how to manage OpenAI content filtering programmatically in a .NET app using the OpenAI client library." ms.topic: how-to -ms.date: 05/29/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted #customer intent: As a .NET developer, I want to manage OpenAI Content Filtering in a .NET app @@ -10,9 +11,9 @@ ms.date: 05/29/2025 # Work with Azure OpenAI content filtering in a .NET app -This article demonstrates how to handle content filtering concerns in a .NET app. Azure OpenAI Service includes a content filtering system that works alongside core models. This system works by running both the prompt and completion through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Variations in API configurations and application design might affect completions and thus filtering behavior. +This article shows how to handle content filtering in a .NET app. Azure OpenAI Service includes a content filtering system that works alongside core models. It runs both the prompt and completion through an ensemble of classification models to detect and take action on specific categories of potentially harmful content in both input prompts and output completions. Variations in API configurations and application design might affect completions and thus filtering behavior. -The [Content Filtering](/azure/ai-services/openai/concepts/content-filter) documentation provides a deeper exploration of content filtering concepts and concerns. This article provides examples of how to work with content filtering features programmatically in a .NET app. +For a deeper exploration of content filtering concepts and concerns, see the [Content Filtering](/azure/ai-services/openai/concepts/content-filter) documentation. ## Prerequisites diff --git a/docs/ai/tutorials/tutorial-ai-vector-search.md b/docs/ai/tutorials/tutorial-ai-vector-search.md index 3ec82474108d1..11ef1d78f14ba 100644 --- a/docs/ai/tutorials/tutorial-ai-vector-search.md +++ b/docs/ai/tutorials/tutorial-ai-vector-search.md @@ -1,7 +1,8 @@ --- title: Tutorial - Integrate OpenAI with the RAG pattern and vector search using Azure Cosmos DB for MongoDB description: Create a simple recipe app using the RAG pattern and vector search using Azure Cosmos DB for MongoDB. -ms.date: 08/26/2025 +ms.date: 03/04/2026 +ai-usage: ai-assisted ms.topic: tutorial author: alexwolfmsft ms.author: alexwolf @@ -9,7 +10,7 @@ ms.author: alexwolf # Implement Azure OpenAI with RAG using vector search in a .NET app -This tutorial explores integration of the RAG pattern using Open AI models and vector search capabilities in a .NET app. The sample application performs vector searches on custom data stored in Azure Cosmos DB for MongoDB and further refines the responses using generative AI models, such as GPT-35 and GPT-4. In the sections that follow, you'll set up a sample application and explore key code examples that demonstrate these concepts. +This tutorial explores integration of the RAG pattern using OpenAI models and vector search capabilities in a .NET app. The sample application performs vector searches on custom data stored in Azure Cosmos DB for MongoDB and further refines the responses using generative AI models, such as GPT-35 and GPT-4. In the sections that follow, you'll set up a sample application and explore key code examples that demonstrate these concepts. ## Prerequisites @@ -22,7 +23,7 @@ This tutorial explores integration of the RAG pattern using Open AI models and v ## App overview -The Cosmos Recipe Guide app allows you to perform vector and AI driven searches against a set of recipe data. You can search directly for available recipes or prompt the app with ingredient names to find related recipes. The app and the sections ahead guide you through the following workflow to demonstrate this type of functionality: +The Cosmos Recipe Guide app lets you perform vector and AI-driven searches against a set of recipe data. Search directly for available recipes or prompt the app with ingredient names to find related recipes. The app and the sections ahead guide you through the following workflow to demonstrate this type of functionality: 1. Upload sample data to an Azure Cosmos DB for MongoDB database. 1. Create embeddings and a vector index for the uploaded sample data using the Azure OpenAI `text-embedding-ada-002` model. @@ -41,7 +42,7 @@ The Cosmos Recipe Guide app allows you to perform vector and AI driven searches 1. In the _C#/CosmosDB-MongoDBvCore_ folder, open the **CosmosRecipeGuide.sln** file. -1. In the _appsettings.json_ file, replace the following config values with your Azure OpenAI and Azure CosmosDB for MongoDb values: +1. In the _appsettings.json_ file, replace the following config values with your Azure OpenAI and Azure Cosmos DB for MongoDB values: ```json "OpenAIEndpoint": "https://.openai.azure.com/", @@ -112,7 +113,7 @@ When you run the app for the first time, it connects to Azure Cosmos DB and repo 1. Select **Vectorize the recipe(s) and store them in Cosmos DB**. - The JSON items uploaded to Cosmos DB do not contain embeddings and therefore are not optimized for RAG via vector search. An embedding is an information-dense, numerical representation of the semantic meaning of a piece of text. Vector searches are able to find items with contextually similar embeddings. + The JSON items uploaded to Cosmos DB don't contain embeddings and therefore are not optimized for RAG via vector search. An embedding is an information-dense, numerical representation of the semantic meaning of a piece of text. Vector searches can find items with contextually similar embeddings. The `GetEmbeddingsAsync` method in the _OpenAIService.cs_ file creates an embedding for each item in the database. @@ -141,7 +142,7 @@ When you run the app for the first time, it connects to Azure Cosmos DB and repo } ``` - The `CreateVectorIndexIfNotExists` in the _VCoreMongoService.cs_ file creates a vector index, which enables you to perform vector similarity searches. + The `CreateVectorIndexIfNotExists` in the _VCoreMongoService.cs_ file creates a vector index, which lets you perform vector similarity searches. ```C# public void CreateVectorIndexIfNotExists(string vectorIndexName) @@ -184,9 +185,9 @@ When you run the app for the first time, it connects to Azure Cosmos DB and repo } ``` -1. Select the **Ask AI Assistant (search for a recipe by name or description, or ask a question)** option in the application to run a user query. +1. Select the **Ask AI Assistant (search for a recipe by name or description, or ask a question)** option in the app to run a user query. - The user query is converted to an embedding using the OpenAI service and the embedding model. The embedding is then sent to Azure Cosmos DB for MongoDB and is used to perform a vector search. The `VectorSearchAsync` method in the _VCoreMongoService.cs_ file performs a vector search to find vectors that are close to the supplied vector and returns a list of documents from Azure Cosmos DB for MongoDB vCore. + The app converts the user query to an embedding using the OpenAI service and the embedding model, then sends the embedding to Azure Cosmos DB for MongoDB to perform a vector search. The `VectorSearchAsync` method in the _VCoreMongoService.cs_ file performs a vector search to find vectors that are close to the supplied vector and returns a list of documents from Azure Cosmos DB for MongoDB vCore. ```C# public async Task> VectorSearchAsync(float[] queryVector) From bd4bf4bad410d8f551c77b06e9aa8a1ab61c4e33 Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Wed, 4 Mar 2026 15:25:32 -0800 Subject: [PATCH 3/4] Apply suggestions from code review --- docs/ai/get-started-app-chat-template.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/ai/get-started-app-chat-template.md b/docs/ai/get-started-app-chat-template.md index f5b9cf533f782..5b3f3842763a3 100644 --- a/docs/ai/get-started-app-chat-template.md +++ b/docs/ai/get-started-app-chat-template.md @@ -40,7 +40,7 @@ The architecture of the chat app is shown in the following diagram: - **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. - **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: - [**Azure AI Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. - - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/overview/) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. + - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Microsoft Agent Framework](/agent-framework/overview/agent-framework-overview) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. ## Cost @@ -158,7 +158,7 @@ The sample repository contains all the code and configuration files you need to 1. When prompted, select a subscription to create the resources in. 1. When you're prompted to select a location the first time, select a location near you. This location is used for most the resources including hosting. 1. If you're prompted for a location for the OpenAI model, select a location that is near you. If the same location is available as your first location, select that. -1. Wait until the app is deployed. The deployment may take up to 20 minutes to complete. +1. Wait until the app is deployed. The deployment might take up to 20 minutes to complete. 1. After the application deploys successfully, a URL appears in the terminal. 1. Select that URL labeled `Deploying service web` to open the chat application in a browser. From c555d4d1500ed33f27ea2698ffb8604202d934b5 Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Wed, 4 Mar 2026 15:26:32 -0800 Subject: [PATCH 4/4] Update docs/ai/conceptual/understanding-tokens.md --- docs/ai/conceptual/understanding-tokens.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ai/conceptual/understanding-tokens.md b/docs/ai/conceptual/understanding-tokens.md index a9c80932f11dd..944db391a4607 100644 --- a/docs/ai/conceptual/understanding-tokens.md +++ b/docs/ai/conceptual/understanding-tokens.md @@ -8,7 +8,7 @@ ai-usage: ai-assisted --- # Understand tokens -Large language models (LLMs) generate tokens—words, character sets, or combinations of words and punctuation—by decomposing text. Tokenization is the first step in training. The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence. +Large language models (LLMs) generate *tokens*, which are words, character sets, or combinations of words and punctuation, by decomposing text. Tokenization is the first step in training. The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence. ## Turn text into tokens