microsoft · ValbuenaVC · May 7, 2026 · Apr 24, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/doc/myst.yml b/doc/myst.yml
@@ -60,6 +60,7 @@ project:
         - file: scanner/1_pyrit_scan.ipynb
         - file: scanner/2_pyrit_shell.md
         - file: scanner/airt.ipynb
+        - file: scanner/benchmark.ipynb
         - file: scanner/foundry.ipynb
         - file: scanner/garak.ipynb
     - file: code/framework.md

diff --git a/doc/scanner/0_scanner.md b/doc/scanner/0_scanner.md
@@ -28,11 +28,12 @@ pyrit_scan foundry.red_team_agent --target openai_chat --initializers target loa
 
 ## Built-in Scenarios
 
-PyRIT ships with scenarios organized into three families:
+PyRIT ships with scenarios organized into the following families:
 
 | Family | Scenarios | Documentation |
 |--------|-----------|---------------|
 | **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam | [AIRT Scenarios](airt.ipynb) |
+| **Benchmark** | AdversarialBenchmark | [Benchmark Scenarios](benchmark.ipynb) |
 | **Foundry** | RedTeamAgent | [Foundry Scenarios](foundry.ipynb) |
 | **Garak** | Encoding | [Garak Scenarios](garak.ipynb) |
 

diff --git a/doc/scanner/benchmark.ipynb b/doc/scanner/benchmark.ipynb
@@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0",
+   "metadata": {},
+   "source": [
+    "# Benchmark Scenarios\n",
+    "\n",
+    "Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1",
+   "metadata": {},
+   "source": [
+    "## Adversarial Benchmark\n",
+    "The adversarial benchmarking scenario (`AdversarialBenchmark`) compares the effectiveness of different adversarial models in successfully executing attacks against a target model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
+      "Loaded environment file: ./.pyrit/.env\n",
+      "Loaded environment file: ./.pyrit/.env.local\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "8316db039ba1408499df0a2de6c8d6f6",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Executing AdversarialBenchmark:   0%|          | 0/3 [00:00<?, ?attack/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Scenario result id: e01b35d2-c7f8-49bd-aafd-5d44ef7235f4\n",
+      "\n",
+      "\u001b[36m====================================================================================================\u001b[0m\n",
+      "\u001b[1m\u001b[36m                              📊 SCENARIO RESULTS: AdversarialBenchmark                              \u001b[0m\n",
+      "\u001b[36m====================================================================================================\u001b[0m\n",
+      "\n",
+      "\u001b[1m\u001b[36m▼ Scenario Information\u001b[0m\n",
+      "\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "\u001b[1m  📋 Scenario Details\u001b[0m\n",
+      "\u001b[36m    • Name: AdversarialBenchmark\u001b[0m\n",
+      "\u001b[36m    • Scenario Version: 1\u001b[0m\n",
+      "\u001b[36m    • PyRIT Version: 0.14.0.dev0\u001b[0m\n",
+      "\u001b[36m    • Description:\u001b[0m\n",
+      "\u001b[36m        Benchmarking scenario that compares the attack success rate (ASR) of several different adversarial models.\u001b[0m\n",
+      "\n",
+      "\u001b[1m  🎯 Target Information\u001b[0m\n",
+      "\u001b[36m    • Target Type: OpenAIChatTarget\u001b[0m\n",
+      "\u001b[36m    • Target Model: gpt-4o-japan-nilfilter\u001b[0m\n",
+      "\u001b[36m    • Target Endpoint: https://pyrit-japan-test.openai.azure.com/openai/v1\u001b[0m\n",
+      "\n",
+      "\u001b[1m  📊 Scorer Information\u001b[0m\n",
+      "\u001b[37m    ▸ Scorer Identifier\u001b[0m\n",
+      "\u001b[36m      • Scorer Type: TrueFalseInverterScorer\u001b[0m\n",
+      "\u001b[36m      • scorer_type: true_false\u001b[0m\n",
+      "\u001b[36m      • score_aggregator: OR_\u001b[0m\n",
+      "\u001b[36m        └─ Composite of 1 scorer(s):\u001b[0m\n",
+      "\u001b[36m            • Scorer Type: SelfAskRefusalScorer\u001b[0m\n",
+      "\u001b[36m            • scorer_type: true_false\u001b[0m\n",
+      "\u001b[36m            • score_aggregator: OR_\u001b[0m\n",
+      "\u001b[36m            • model_name: gpt-4o-japan-nilfilter\u001b[0m\n",
+      "\n",
+      "\u001b[37m    ▸ Performance Metrics\u001b[0m\n",
+      "\u001b[33m      Official evaluation has not been run yet for this specific configuration\u001b[0m\n",
+      "\n",
+      "\u001b[1m\u001b[36m▼ Overall Statistics\u001b[0m\n",
+      "\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "\u001b[1m  📈 Summary\u001b[0m\n",
+      "\u001b[32m    • Total Strategies: 3\u001b[0m\n",
+      "\u001b[32m    • Total Attack Results: 24\u001b[0m\n",
+      "\u001b[36m    • Overall Success Rate: 25%\u001b[0m\n",
+      "\u001b[32m    • Unique Objectives: 8\u001b[0m\n",
+      "\n",
+      "\u001b[1m\u001b[36m▼ Per-Group Breakdown\u001b[0m\n",
+      "\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "\n",
+      "\u001b[1m  🔸 Group: gpt-4o-japan-nilfilter\u001b[0m\n",
+      "\u001b[33m    • Number of Results: 24\u001b[0m\n",
+      "\u001b[36m    • Success Rate: 25%\u001b[0m\n",
+      "\n",
+      "\u001b[36m====================================================================================================\u001b[0m\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pyrit.prompt_target import OpenAIChatTarget\n",
+    "from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter\n",
+    "from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark\n",
+    "from pyrit.setup import IN_MEMORY, initialize_pyrit_async\n",
+    "from pyrit.setup.initializers import LoadDefaultDatasets\n",
+    "\n",
+    "await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()])  # type: ignore\n",
+    "\n",
+    "# Pass any number of adversarial PromptChatTargets as a list; AdversarialBenchmark\n",
+    "# infers a label for each from its identifier and runs every benchmark-friendly\n",
+    "# attack technique against the objective target with each adversarial model.\n",
+    "adversarial_model = OpenAIChatTarget()\n",
+    "\n",
+    "benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])\n",
+    "\n",
+    "await benchmark_scenario.initialize_async(  # type: ignore\n",
+    "    objective_target=OpenAIChatTarget(), max_concurrency=2\n",
+    ")\n",
+    "\n",
+    "baseline_result = await benchmark_scenario.run_async()  # type: ignore\n",
+    "\n",
+    "# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick\n",
+    "# up where this run left off (constructor args must match the original run).\n",
+    "print(f\"Scenario result id: {baseline_result.id}\")\n",
+    "\n",
+    "printer = ConsoleScenarioResultPrinter()\n",
+    "\n",
+    "await printer.print_summary_async(baseline_result)  # type: ignore"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "main_language": "python"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/scanner/benchmark.py b/doc/scanner/benchmark.py
@@ -0,0 +1,48 @@
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.18.1
+# ---
+
+# %% [markdown]
+# # Benchmark Scenarios
+#
+# Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks.
+
+# %% [markdown]
+# ## Adversarial Benchmark
+# The adversarial benchmarking scenario (`AdversarialBenchmark`) compares the effectiveness of different adversarial models in successfully executing attacks against a target model.
+
+# %%
+from pyrit.prompt_target import OpenAIChatTarget
+from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter
+from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark
+from pyrit.setup import IN_MEMORY, initialize_pyrit_async
+from pyrit.setup.initializers import LoadDefaultDatasets
+
+await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()])  # type: ignore
+
+# Pass any number of adversarial PromptChatTargets as a list; AdversarialBenchmark
+# infers a label for each from its identifier and runs every benchmark-friendly
+# attack technique against the objective target with each adversarial model.
+adversarial_model = OpenAIChatTarget()
+
+benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])
+
+await benchmark_scenario.initialize_async(  # type: ignore
+    objective_target=OpenAIChatTarget(), max_concurrency=2
+)
+
+baseline_result = await benchmark_scenario.run_async()  # type: ignore
+
+# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick
+# up where this run left off (constructor args must match the original run).
+print(f"Scenario result id: {baseline_result.id}")
+
+printer = ConsoleScenarioResultPrinter()
+
+await printer.print_summary_async(baseline_result)  # type: ignore
diff --git a/pyrit/scenario/__init__.py b/pyrit/scenario/__init__.py
@@ -31,15 +31,18 @@
 # This allows: from pyrit.scenario.airt import ContentHarms
 # without needing separate pyrit/scenario/airt/ directories
 from pyrit.scenario.scenarios import airt as _airt_module
+from pyrit.scenario.scenarios import benchmark as _benchmark_module
 from pyrit.scenario.scenarios import foundry as _foundry_module
 from pyrit.scenario.scenarios import garak as _garak_module
 
 sys.modules["pyrit.scenario.airt"] = _airt_module
+sys.modules["pyrit.scenario.benchmark"] = _benchmark_module
 sys.modules["pyrit.scenario.garak"] = _garak_module
 sys.modules["pyrit.scenario.foundry"] = _foundry_module
 
 # Also expose as attributes for IDE support
 airt = _airt_module
+benchmark = _benchmark_module
 garak = _garak_module
 foundry = _foundry_module
 
@@ -55,6 +58,7 @@
     "ScenarioIdentifier",
     "ScenarioResult",
     "airt",
+    "benchmark",
     "garak",
     "foundry",
 ]
diff --git a/pyrit/scenario/core/scenario_techniques.py b/pyrit/scenario/core/scenario_techniques.py
@@ -25,6 +25,7 @@
 
 from pyrit.common.path import EXECUTOR_SEED_PROMPT_PATH
 from pyrit.executor.attack import (
+    ContextComplianceAttack,
     ManyShotJailbreakAttack,
     PromptSendingAttack,
     RedTeamingAttack,
@@ -56,18 +57,18 @@
     AttackTechniqueSpec(
         name="prompt_sending",
         attack_class=PromptSendingAttack,
-        strategy_tags=["core", "single_turn", "default"],
+        strategy_tags=["core", "single_turn", "default", "light"],
     ),
     AttackTechniqueSpec(
         name="role_play",
         attack_class=RolePlayAttack,
-        strategy_tags=["core", "single_turn"],
+        strategy_tags=["core", "single_turn", "light"],
         extra_kwargs={"role_play_definition_path": RolePlayPaths.MOVIE_SCRIPT.value},
     ),
     AttackTechniqueSpec(
         name="many_shot",
         attack_class=ManyShotJailbreakAttack,
-        strategy_tags=["core", "multi_turn", "default"],
+        strategy_tags=["core", "multi_turn", "default", "light"],
     ),
     AttackTechniqueSpec(
         name="tap",
@@ -93,7 +94,12 @@
     AttackTechniqueSpec(
         name="red_teaming",
         attack_class=RedTeamingAttack,
-        strategy_tags=["core", "multi_turn"],
+        strategy_tags=["core", "multi_turn", "light"],
+    ),
+    AttackTechniqueSpec(
+        name="context_compliance",
+        attack_class=ContextComplianceAttack,
+        strategy_tags=["core", "single_turn", "light"],
     ),
 ]
 

diff --git a/pyrit/scenario/scenarios/benchmark/__init__.py b/pyrit/scenario/scenarios/benchmark/__init__.py
@@ -0,0 +1,26 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+"""Benchmark scenario classes."""
+
+from typing import Any
+
+from pyrit.scenario.scenarios.benchmark.adversarial import AdversarialBenchmark
+
+
+def __getattr__(name: str) -> Any:
+    """
+    Lazily resolve the dynamic BenchmarkStrategy class.
+
+    Returns:
+        Any: The resolved strategy class.
+
+    Raises:
+        AttributeError: If the attribute name is not recognized.
+    """
+    if name == "AdversarialBenchmarkStrategy":
+        return AdversarialBenchmark.get_strategy_class()
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+
+
+__all__ = ["AdversarialBenchmark", "AdversarialBenchmarkStrategy"]