FEAT: Benchmark Scenario#1662
Conversation
…benchmark Resolving PR comments.
…benchmark PR comments.
| adversarial_models: Either a ``dict`` mapping user-chosen labels to | ||
| ``PromptChatTarget`` instances, or a ``list`` of targets (labels | ||
| inferred from each target's identifier). When a list is given, | ||
| identical targets are silently deduped and distinct targets | ||
| whose inferred names collide are suffixed (``_2``, ``_3``, …) | ||
| with a warning. Each target is wrapped in a default | ||
| ``AttackAdversarialConfig`` before being injected into each | ||
| technique. |
There was a problem hiding this comment.
Do we need to support two types of inputs? Why not just enforce list OR dict? This adds complexity in the constructor
There was a problem hiding this comment.
I'm not sure I understand. Right now the constructor takes adversarial_models: dict[str, PromptChatTarget] | list[PromptChatTarget] which is the list OR dict point you made. The reason for accepting either is that assigning a name to a target is non-trivial, so user-provided tags are convenient
There was a problem hiding this comment.
Could we normalize on the underlying model name? E.g. how we use it in evaluation identifiers is
target._underlying_model or target._model_name
There was a problem hiding this comment.
That's the current implementation for non-user provided labels (line 265 of adversarial.py). But if we want to, we can just remove user labeling as a feature and fall back on that normalization by default. What do you think?
There was a problem hiding this comment.
+1 to not really understanding why we have two parameter types (but if we do i think we need to display the list version in the notebook). I think i'd prefer a list just bc it's simpler
| # %% | ||
| # %load_ext autoreload | ||
| # %autoreload 2 |
| | Family | Scenarios | Documentation | | ||
| |--------|-----------|---------------| | ||
| | **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam | [AIRT Scenarios](airt.ipynb) | | ||
| | **AIRT** | ContentHarms, Psychosocial, Cyber, Jailbreak, Leakage, Scam, AdversarialBenchmark | [AIRT Scenarios](airt.ipynb) | |
There was a problem hiding this comment.
nit: shouldn't we have a separate benchmark family ? just thinking we're putting it in a separate folder and can see having more than one benchmarking scenario
| """ | ||
| return [ | ||
| Parameter( | ||
| name="include_default_baseline", |
There was a problem hiding this comment.
could this be something that's exposed in the scenario base class since it's not specific to this scenario ?
| if "" in adversarial_models: | ||
| raise ValueError(f"Empty user-chosen label passed to adversarial_models! Got `{adversarial_models}`.") | ||
|
|
||
| # Stage B: wrap each bare target in a default AttackAdversarialConfig. |
There was a problem hiding this comment.
super nit: i think numbers for the stages is more intuitive (ie Stage 1 instead of Stage A)
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed under the MIT license. | ||
|
|
||
| """ |
There was a problem hiding this comment.
Description
Adds a benchmarking scenario (
AdversarialBenchmark) to PyRIT to compare the performance between adversarial targets. Measurement of success uses attack success rate (ASR). The benchmarking scenario is unique in that it takes models as a runtime argument, so we override_get_atomic_attacks_asyncto patch in live adversarial targets while using theAttackTechniqueRegistryto buildAdversarialBenchmarkStrategy. The scenario takeslist[PromptTarget]in its constructor.Includes minor changes to
SCENARIO_TECHNIQUES(including adding another attack technique) and a small change totest_rapid_response.pybecause of said change. Also adds alightaggregate for faster attacks that are benchmark friendly.Also includes limited parameter support from #1680.
Tests and Documentation
Added
tests/unit/scenario/test_benchmark.py.Updated
tests/unit/scenario/test_rapid_response.py.