FEAT: Benchmark Scenario by ValbuenaVC · Pull Request #1662 · microsoft/PyRIT

ValbuenaVC · 2026-04-27T23:40:50Z

Description

Adds a benchmarking scenario (AdversarialBenchmark) to PyRIT to compare the performance between adversarial targets. Measurement of success uses attack success rate (ASR). The benchmarking scenario is unique in that it takes models as a runtime argument, so we override _get_atomic_attacks_async to patch in live adversarial targets while using the AttackTechniqueRegistry to build AdversarialBenchmarkStrategy. The scenario takes list[PromptTarget] in its constructor.

Includes minor changes to SCENARIO_TECHNIQUES (including adding another attack technique) and a small change to test_rapid_response.py because of said change. Also adds a light aggregate for faster attacks that are benchmark friendly.

Also includes limited parameter support from #1680.

Tests and Documentation

Added tests/unit/scenario/test_benchmark.py.
Updated tests/unit/scenario/test_rapid_response.py.

…benchmark Resolving PR comments.

…benchmark PR comments.

romanlutz · 2026-05-06T12:08:32Z

+            adversarial_models: Either a ``dict`` mapping user-chosen labels to
+                ``PromptChatTarget`` instances, or a ``list`` of targets (labels
+                inferred from each target's identifier).  When a list is given,
+                identical targets are silently deduped and distinct targets
+                whose inferred names collide are suffixed (``_2``, ``_3``, …)
+                with a warning.  Each target is wrapped in a default
+                ``AttackAdversarialConfig`` before being injected into each
+                technique.


Do we need to support two types of inputs? Why not just enforce list OR dict? This adds complexity in the constructor

I'm not sure I understand. Right now the constructor takes adversarial_models: dict[str, PromptChatTarget] | list[PromptChatTarget] which is the list OR dict point you made. The reason for accepting either is that assigning a name to a target is non-trivial, so user-provided tags are convenient

Could we normalize on the underlying model name? E.g. how we use it in evaluation identifiers is

target._underlying_model or target._model_name

That's the current implementation for non-user provided labels (line 265 of adversarial.py). But if we want to, we can just remove user labeling as a feature and fall back on that normalization by default. What do you think?

+1 to not really understanding why we have two parameter types (but if we do i think we need to display the list version in the notebook). I think i'd prefer a list just bc it's simpler

hannahwestra25 · 2026-05-06T20:29:19Z

+# %%
+# %load_ext autoreload
+# %autoreload 2


what is this ?

hannahwestra25 · 2026-05-06T20:33:06Z

 | Family | Scenarios | Documentation |
 |--------|-----------|---------------|
-| **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam | [AIRT Scenarios](airt.ipynb) |
+| **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam, AdversarialBenchmark | [AIRT Scenarios](airt.ipynb) |


nit: shouldn't we have a separate benchmark family ? just thinking we're putting it in a separate folder and can see having more than one benchmarking scenario

hannahwestra25 · 2026-05-06T21:04:54Z

+        """
+        return [
+            Parameter(
+                name="include_default_baseline",


could this be something that's exposed in the scenario base class since it's not specific to this scenario ?

hannahwestra25 · 2026-05-06T21:05:42Z

+        if "" in adversarial_models:
+            raise ValueError(f"Empty user-chosen label passed to adversarial_models! Got `{adversarial_models}`.")
+
+        # Stage B: wrap each bare target in a default AttackAdversarialConfig.


super nit: i think numbers for the stages is more intuitive (ie Stage 1 instead of Stage A)

hannahwestra25 · 2026-05-06T21:23:22Z

+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+"""


also update this: https://github.com/microsoft/PyRIT/blob/main/doc/code/scenarios/0_scenarios.ipynb

Victor Valbuena added 2 commits April 23, 2026 17:33

notes

0e86b33

draft PR

42d3ab5

ValbuenaVC changed the title ~~Benchmark~~ [DRAFT] FEAT: Benchmark Scenario Apr 27, 2026

Victor Valbuena and others added 2 commits April 27, 2026 16:43

tests

f5f1563

Merge branch 'main' into benchmark

d36ced0

rlundeen2 reviewed Apr 28, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

Merge branch 'main' into benchmark

1c38950

jbolor21 reviewed Apr 28, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

jbolor21 reviewed Apr 28, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

jbolor21 reviewed Apr 28, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

ValbuenaVC and others added 6 commits April 29, 2026 10:04

Merge branch 'main' into benchmark

53e97d1

.

155dcf0

Merge branch 'main' into benchmark

900ab78

Merge branch 'main' into benchmark

066c514

refactored from 1664

c06fb05

Merge branch 'main' into benchmark

5972cb0

ValbuenaVC marked this pull request as ready for review May 1, 2026 23:40

ValbuenaVC changed the title ~~[DRAFT] FEAT: Benchmark Scenario~~ FEAT: Benchmark Scenario May 1, 2026

ValbuenaVC commented May 1, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

ValbuenaVC added 2 commits May 1, 2026 18:27

Merge branch 'main' into benchmark

5c21b06

Merge branch 'main' into benchmark

af9eea7

ValbuenaVC commented May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

rlundeen2 self-assigned this May 4, 2026

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

rlundeen2 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

jsong468 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

jsong468 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

jsong468 reviewed May 4, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/adversarial.py

Victor Valbuena and others added 10 commits May 4, 2026 16:30

PR comments

5661751

Merge branch 'benchmark' of https://github.com/ValbuenaVC/PyRIT into …

7c3315f

…benchmark Resolving PR comments.

Merge branch 'main' into benchmark

5268c07

Merge branch 'benchmark' of https://github.com/ValbuenaVC/PyRIT into …

aeb8561

…benchmark PR comments.

notebook

60a10c4

Merge branch 'main' into benchmark

a52f8e4

PR comments

505b47a

notebook

4ba7a83

notebook improvements

520a4f3

tests

3403503

rlundeen2 reviewed May 5, 2026

View reviewed changes

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated

Victor Valbuena added 2 commits May 5, 2026 17:16

pr comments

15599b8

.

f13c338

ValbuenaVC enabled auto-merge May 6, 2026 16:48

romanlutz reviewed May 6, 2026

View reviewed changes

Victor Valbuena added 4 commits May 6, 2026 10:40

rename

38ce1a2

precommit

8930999

renames

2d0e294

precommit

c90f48d

hannahwestra25 reviewed May 6, 2026

View reviewed changes

Comment thread doc/scanner/adversarial.py

Merge branch 'main' into benchmark

61326a1

Conversation

ValbuenaVC commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ValbuenaVC commented Apr 27, 2026 •

edited

Loading

rlundeen2 May 6, 2026 •

edited

Loading