Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
0e86b33
notes
Apr 24, 2026
42d3ab5
draft PR
Apr 27, 2026
f5f1563
tests
Apr 27, 2026
d36ced0
Merge branch 'main' into benchmark
ValbuenaVC Apr 27, 2026
1c38950
Merge branch 'main' into benchmark
ValbuenaVC Apr 28, 2026
53e97d1
Merge branch 'main' into benchmark
ValbuenaVC Apr 29, 2026
155dcf0
.
Apr 29, 2026
900ab78
Merge branch 'main' into benchmark
ValbuenaVC Apr 30, 2026
066c514
Merge branch 'main' into benchmark
ValbuenaVC May 1, 2026
c06fb05
refactored from 1664
May 1, 2026
5972cb0
Merge branch 'main' into benchmark
ValbuenaVC May 1, 2026
5c21b06
Merge branch 'main' into benchmark
ValbuenaVC May 2, 2026
af9eea7
Merge branch 'main' into benchmark
ValbuenaVC May 4, 2026
5661751
PR comments
May 4, 2026
7c3315f
Merge branch 'benchmark' of https://github.com/ValbuenaVC/PyRIT into …
May 4, 2026
5268c07
Merge branch 'main' into benchmark
ValbuenaVC May 5, 2026
aeb8561
Merge branch 'benchmark' of https://github.com/ValbuenaVC/PyRIT into …
May 5, 2026
60a10c4
notebook
May 5, 2026
a52f8e4
Merge branch 'main' into benchmark
ValbuenaVC May 5, 2026
505b47a
PR comments
May 5, 2026
4ba7a83
notebook
May 5, 2026
520a4f3
notebook improvements
May 5, 2026
3403503
tests
May 5, 2026
15599b8
pr comments
May 6, 2026
f13c338
.
May 6, 2026
38ce1a2
rename
May 6, 2026
8930999
precommit
May 6, 2026
2d0e294
renames
May 6, 2026
c90f48d
precommit
May 6, 2026
61326a1
Merge branch 'main' into benchmark
ValbuenaVC May 6, 2026
420f29d
Merge branch 'main' into benchmark
ValbuenaVC May 7, 2026
6dc06c5
pr comments
May 7, 2026
1ac06c6
pr comments
May 7, 2026
626db62
notebook
May 7, 2026
ebf63e5
benchmark notebook
May 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/myst.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ project:
- file: scanner/1_pyrit_scan.ipynb
- file: scanner/2_pyrit_shell.md
- file: scanner/airt.ipynb
- file: scanner/benchmark.ipynb
- file: scanner/foundry.ipynb
- file: scanner/garak.ipynb
- file: code/framework.md
Expand Down
3 changes: 2 additions & 1 deletion doc/scanner/0_scanner.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ pyrit_scan foundry.red_team_agent --target openai_chat --initializers target loa

## Built-in Scenarios

PyRIT ships with scenarios organized into three families:
PyRIT ships with scenarios organized into the following families:

| Family | Scenarios | Documentation |
|--------|-----------|---------------|
| **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam | [AIRT Scenarios](airt.ipynb) |
| **Benchmark** | AdversarialBenchmark | [Benchmark Scenarios](benchmark.ipynb) |
| **Foundry** | RedTeamAgent | [Foundry Scenarios](foundry.ipynb) |
| **Garak** | Encoding | [Garak Scenarios](garak.ipynb) |

Expand Down
160 changes: 160 additions & 0 deletions doc/scanner/benchmark.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0",
"metadata": {},
"source": [
"# Benchmark Scenarios\n",
"\n",
"Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks."
]
},
{
"cell_type": "markdown",
"id": "1",
"metadata": {},
"source": [
"## Adversarial Benchmark\n",
"The adversarial benchmarking scenario (`AdversarialBenchmark`) compares the effectiveness of different adversarial models in successfully executing attacks against a target model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']\n",
"Loaded environment file: ./.pyrit/.env\n",
"Loaded environment file: ./.pyrit/.env.local\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8316db039ba1408499df0a2de6c8d6f6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Executing AdversarialBenchmark: 0%| | 0/3 [00:00<?, ?attack/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Scenario result id: e01b35d2-c7f8-49bd-aafd-5d44ef7235f4\n",
"\n",
"\u001b[36m====================================================================================================\u001b[0m\n",
"\u001b[1m\u001b[36m 📊 SCENARIO RESULTS: AdversarialBenchmark \u001b[0m\n",
"\u001b[36m====================================================================================================\u001b[0m\n",
"\n",
"\u001b[1m\u001b[36m▼ Scenario Information\u001b[0m\n",
"\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[1m 📋 Scenario Details\u001b[0m\n",
"\u001b[36m • Name: AdversarialBenchmark\u001b[0m\n",
"\u001b[36m • Scenario Version: 1\u001b[0m\n",
"\u001b[36m • PyRIT Version: 0.14.0.dev0\u001b[0m\n",
"\u001b[36m • Description:\u001b[0m\n",
"\u001b[36m Benchmarking scenario that compares the attack success rate (ASR) of several different adversarial models.\u001b[0m\n",
"\n",
"\u001b[1m 🎯 Target Information\u001b[0m\n",
"\u001b[36m • Target Type: OpenAIChatTarget\u001b[0m\n",
"\u001b[36m • Target Model: gpt-4o-japan-nilfilter\u001b[0m\n",
"\u001b[36m • Target Endpoint: https://pyrit-japan-test.openai.azure.com/openai/v1\u001b[0m\n",
"\n",
"\u001b[1m 📊 Scorer Information\u001b[0m\n",
"\u001b[37m ▸ Scorer Identifier\u001b[0m\n",
"\u001b[36m • Scorer Type: TrueFalseInverterScorer\u001b[0m\n",
"\u001b[36m • scorer_type: true_false\u001b[0m\n",
"\u001b[36m • score_aggregator: OR_\u001b[0m\n",
"\u001b[36m └─ Composite of 1 scorer(s):\u001b[0m\n",
"\u001b[36m • Scorer Type: SelfAskRefusalScorer\u001b[0m\n",
"\u001b[36m • scorer_type: true_false\u001b[0m\n",
"\u001b[36m • score_aggregator: OR_\u001b[0m\n",
"\u001b[36m • model_name: gpt-4o-japan-nilfilter\u001b[0m\n",
"\n",
"\u001b[37m ▸ Performance Metrics\u001b[0m\n",
"\u001b[33m Official evaluation has not been run yet for this specific configuration\u001b[0m\n",
"\n",
"\u001b[1m\u001b[36m▼ Overall Statistics\u001b[0m\n",
"\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\u001b[1m 📈 Summary\u001b[0m\n",
"\u001b[32m • Total Strategies: 3\u001b[0m\n",
"\u001b[32m • Total Attack Results: 24\u001b[0m\n",
"\u001b[36m • Overall Success Rate: 25%\u001b[0m\n",
"\u001b[32m • Unique Objectives: 8\u001b[0m\n",
"\n",
"\u001b[1m\u001b[36m▼ Per-Group Breakdown\u001b[0m\n",
"\u001b[36m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
"\n",
"\u001b[1m 🔸 Group: gpt-4o-japan-nilfilter\u001b[0m\n",
"\u001b[33m • Number of Results: 24\u001b[0m\n",
"\u001b[36m • Success Rate: 25%\u001b[0m\n",
"\n",
"\u001b[36m====================================================================================================\u001b[0m\n",
"\n"
]
}
],
"source": [
"from pyrit.prompt_target import OpenAIChatTarget\n",
"from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter\n",
"from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark\n",
"from pyrit.setup import IN_MEMORY, initialize_pyrit_async\n",
"from pyrit.setup.initializers import LoadDefaultDatasets\n",
"\n",
"await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()]) # type: ignore\n",
"\n",
"# Pass any number of adversarial PromptChatTargets as a list; AdversarialBenchmark\n",
"# infers a label for each from its identifier and runs every benchmark-friendly\n",
"# attack technique against the objective target with each adversarial model.\n",
"adversarial_model = OpenAIChatTarget()\n",
"\n",
"benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])\n",
"\n",
"await benchmark_scenario.initialize_async( # type: ignore\n",
" objective_target=OpenAIChatTarget(), max_concurrency=2\n",
")\n",
"\n",
"baseline_result = await benchmark_scenario.run_async() # type: ignore\n",
"\n",
"# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick\n",
"# up where this run left off (constructor args must match the original run).\n",
"print(f\"Scenario result id: {baseline_result.id}\")\n",
"\n",
"printer = ConsoleScenarioResultPrinter()\n",
"\n",
"await printer.print_summary_async(baseline_result) # type: ignore"
]
}
],
"metadata": {
"jupytext": {
"main_language": "python"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
48 changes: 48 additions & 0 deletions doc/scanner/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.18.1
# ---

# %% [markdown]
# # Benchmark Scenarios
#
# Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks.

# %% [markdown]
# ## Adversarial Benchmark
# The adversarial benchmarking scenario (`AdversarialBenchmark`) compares the effectiveness of different adversarial models in successfully executing attacks against a target model.

# %%
from pyrit.prompt_target import OpenAIChatTarget
from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter
from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark
from pyrit.setup import IN_MEMORY, initialize_pyrit_async
from pyrit.setup.initializers import LoadDefaultDatasets

await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()]) # type: ignore

# Pass any number of adversarial PromptChatTargets as a list; AdversarialBenchmark
# infers a label for each from its identifier and runs every benchmark-friendly
# attack technique against the objective target with each adversarial model.
adversarial_model = OpenAIChatTarget()

benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])

await benchmark_scenario.initialize_async( # type: ignore
objective_target=OpenAIChatTarget(), max_concurrency=2
)

baseline_result = await benchmark_scenario.run_async() # type: ignore

# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick
# up where this run left off (constructor args must match the original run).
print(f"Scenario result id: {baseline_result.id}")

printer = ConsoleScenarioResultPrinter()

await printer.print_summary_async(baseline_result) # type: ignore
4 changes: 4 additions & 0 deletions pyrit/scenario/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,18 @@
# This allows: from pyrit.scenario.airt import ContentHarms
# without needing separate pyrit/scenario/airt/ directories
from pyrit.scenario.scenarios import airt as _airt_module
from pyrit.scenario.scenarios import benchmark as _benchmark_module
from pyrit.scenario.scenarios import foundry as _foundry_module
from pyrit.scenario.scenarios import garak as _garak_module

sys.modules["pyrit.scenario.airt"] = _airt_module
sys.modules["pyrit.scenario.benchmark"] = _benchmark_module
sys.modules["pyrit.scenario.garak"] = _garak_module
sys.modules["pyrit.scenario.foundry"] = _foundry_module

# Also expose as attributes for IDE support
airt = _airt_module
benchmark = _benchmark_module
garak = _garak_module
foundry = _foundry_module

Expand All @@ -55,6 +58,7 @@
"ScenarioIdentifier",
"ScenarioResult",
"airt",
"benchmark",
"garak",
"foundry",
]
14 changes: 10 additions & 4 deletions pyrit/scenario/core/scenario_techniques.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

from pyrit.common.path import EXECUTOR_SEED_PROMPT_PATH
from pyrit.executor.attack import (
ContextComplianceAttack,
ManyShotJailbreakAttack,
PromptSendingAttack,
RedTeamingAttack,
Expand Down Expand Up @@ -56,18 +57,18 @@
AttackTechniqueSpec(
name="prompt_sending",
attack_class=PromptSendingAttack,
strategy_tags=["core", "single_turn", "default"],
strategy_tags=["core", "single_turn", "default", "light"],
),
AttackTechniqueSpec(
name="role_play",
attack_class=RolePlayAttack,
strategy_tags=["core", "single_turn"],
strategy_tags=["core", "single_turn", "light"],
extra_kwargs={"role_play_definition_path": RolePlayPaths.MOVIE_SCRIPT.value},
),
AttackTechniqueSpec(
name="many_shot",
attack_class=ManyShotJailbreakAttack,
strategy_tags=["core", "multi_turn", "default"],
strategy_tags=["core", "multi_turn", "default", "light"],
),
AttackTechniqueSpec(
name="tap",
Expand All @@ -93,7 +94,12 @@
AttackTechniqueSpec(
name="red_teaming",
attack_class=RedTeamingAttack,
strategy_tags=["core", "multi_turn"],
strategy_tags=["core", "multi_turn", "light"],
),
AttackTechniqueSpec(
name="context_compliance",
attack_class=ContextComplianceAttack,
strategy_tags=["core", "single_turn", "light"],
),
]

Expand Down
26 changes: 26 additions & 0 deletions pyrit/scenario/scenarios/benchmark/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Benchmark scenario classes."""

from typing import Any

from pyrit.scenario.scenarios.benchmark.adversarial import AdversarialBenchmark


def __getattr__(name: str) -> Any:
"""
Lazily resolve the dynamic BenchmarkStrategy class.

Returns:
Any: The resolved strategy class.

Raises:
AttributeError: If the attribute name is not recognized.
"""
if name == "AdversarialBenchmarkStrategy":
return AdversarialBenchmark.get_strategy_class()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")


__all__ = ["AdversarialBenchmark", "AdversarialBenchmarkStrategy"]
Loading
Loading