Skip to content

Add Promptfoo (LLM eval & red-teaming) parser#15081

Draft
Dashtid wants to merge 1 commit into
DefectDojo:devfrom
Dashtid:promptfoo-parser
Draft

Add Promptfoo (LLM eval & red-teaming) parser#15081
Dashtid wants to merge 1 commit into
DefectDojo:devfrom
Dashtid:promptfoo-parser

Conversation

@Dashtid

@Dashtid Dashtid commented Jun 24, 2026

Copy link
Copy Markdown

Description

Adds a parser for promptfoo, an open-source LLM evaluation and red-teaming tool, aligned with the AI-testing direction in discussion #13242.

The parser ingests the JSON results file written by promptfoo eval -o results.json (and promptfoo redteam run -o results.json). promptfoo's pass/fail semantics are inverted relative to most scanners, which is the central design point:

  • A result with success: true means every assertion passed — for a red-team probe that means the target model defended the attack — so it is not a finding.
  • A result with success: false is a failed assertion (for a red-team probe, the attack succeeded) and becomes a Finding.
  • Results with failureReason == 2 (a provider/eval error rather than an assertion failure) are skipped — the test could not run, so it is not a vulnerability.

Other decisions:

  • Severity comes from the red-team metadata.severity (critical/high/medium/low). A plain promptfoo eval failure carries no severity metadata and defaults to Medium.
  • CWE is mapped from the plugin / harm category as a deliberately coarse starter, verified against MITRE: SQL-injection -> CWE-89, shell/command-injection -> CWE-78, prompt-injection / prompt-extraction -> CWE-1427, PII / privacy -> CWE-200, default -> CWE-1426 (Improper Validation of Generative AI Output).
  • Failures for the same plugin against the same target are aggregated into one Finding (nb_occurences, keeping the most severe rung).
  • Registered for hash_code deduplication on title + component_name. severity and description are intentionally excluded: description holds the per-run attack input/output, and severity is an aggregate that shifts as the set of failed attempts changes — neither is stable enough for the dedup hash.

Test results

Adds unittests/tools/test_promptfoo_parser.py (14 tests) covering: zero/one/many findings, the severity matrix, the CWE mapping (including the specific *-injection rules taking precedence over the broad rule), aggregation, the plain-eval fallback (severity/title/identity from the failed assertion, metric-over-type), skipping passed and errored results, the lenient input shapes (bare list, top-level results list), shareableUrl -> references, string-form providers, bytes + UTF-8 BOM + non-ASCII input, and rejection of non-JSON input.

The sample scan files under unittests/scans/promptfoo/ are real promptfoo v3 output (results.version == 3).

Documentation

Adds docs/content/supported_tools/parsers/file/promptfoo.md.

Checklist

  • Submitted against dev
  • Ruff-compliant (ruff.toml, ruff 0.15.16)
  • Python 3.13 compliant
  • Documentation included
  • No model changes (no migration needed)
  • Unit tests added
  • Labels (for maintainers): suggest Import Scans and settings_changes (touches settings.dist.py for deduplication)

@github-actions github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser conflicts-detected labels Jun 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Adds a file-based parser for promptfoo (https://promptfoo.dev) results
JSON, produced by `promptfoo eval -o results.json` or
`promptfoo redteam run -o results.json`.

- Inverted semantics: a result with success:false (a failed assertion /
  successful red-team attack) becomes a Finding; success:true (the model
  defended) is skipped, as are failureReason==ERROR (provider) results.
- Severity from red-team metadata.severity, with a Medium fallback for
  plain-eval failures; CWE mapped from the plugin/category as a coarse
  starter (89/78/1427/200, default 1426).
- Failures for the same plugin against the same target aggregate into one
  Finding (nb_occurences), keeping the most severe rung.
- Deduplicated via hash_code on title + component_name; severity and
  description are excluded as unstable across runs.

Verified against the promptfoo v3 results schema (results.version == 3);
the sample scan files are real promptfoo output. Adds unit tests and
parser documentation.

Signed-off-by: David Dashti <dashti.dat@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant