feat(evaluators): add ATR threat detection evaluator by eeee2345 · Pull Request #170 · agentcontrol/agent-control

eeee2345 · 2026-04-09T15:56:28Z

Summary

Add a contrib evaluator using ATR (Agent Threat Rules) for regex-based AI agent threat detection. No API keys required.

Resolves #169

Evaluator details

Name: atr.threat_rules
Auto-discovery: via agent_control.evaluators entry point
Config: ATRConfig with min_severity, categories, block_on_match, on_error
Rules: 20 ATR rules, 306 compiled patterns across 8 categories
Performance: <5ms evaluation, pure regex

Config options

ATRConfig(
    min_severity="medium",       # Filter rules below this severity
    categories=["prompt-injection"],  # Only load specific categories (empty = all)
    block_on_match=True,         # Set matched=True on detection
    on_error="allow",            # "allow" (fail-open) or "deny" (fail-closed)
)

Key design decisions

Multi-match: Returns ALL matching rules, not just first match. Metadata includes findings array + backward-compatible single-match fields.
_coerce_to_string: Scans all priority dict fields (content, input, output, text, message), not just the first.
Error handling: Respects on_error policy — fail-open returns error field, fail-closed returns matched=True.
Follows Cisco evaluator pattern: Same directory structure, pyproject.toml, Makefile, entry points.

Files

evaluators/contrib/atr/
  pyproject.toml              # hatchling build, entry point
  Makefile                    # test/lint/typecheck/build
  README.md                   # Usage docs
  src/.../threat_rules/
    config.py                 # ATRConfig
    evaluator.py              # ATREvaluator (222 lines)
    rules.json                # 20 rules, 306 patterns
  tests/
    test_evaluator.py         # 22 tests

Test plan

Rule loading and compilation
Known-bad inputs: prompt injection, jailbreak, reverse shell, credential exposure, system prompt override
Known-good inputs: normal text, code, URLs
Config: min_severity filtering, category filtering, block_on_match=False
Error handling: None/empty input, on_error deny/allow
Multi-match: content triggering multiple categories returns all findings
Dict input: scans all priority fields
make test / make lint / make typecheck

Add contrib evaluator using ATR (Agent Threat Rules) community rules: - 20 regex rules covering OWASP Agentic Top 10 - Configurable severity threshold and category filtering - Pure regex, no API keys, <5ms evaluation time - Auto-discovered via entry points Source: https://agentthreatrule.org (MIT)

- _match_rules now returns all matching rules, not just first match - Add super().__init__(config) call - _coerce_to_string scans all priority dict fields, not just first - Add multi-match test and dict field scanning test - Backward-compatible: single-match fields still in metadata

lan17

Thanks for contributing this. This is a great start, and I really like the idea of having a local/no-key ATR evaluator available as a contrib package.

I left a few comments below. The main things I think we need to tighten before merging are preserving ATR's field/context semantics, avoiding secret leakage through metadata, and putting a harder bound around regex evaluation latency.

lan17 · 2026-04-26T05:38:02Z

+        for rule in self._compiled_rules:
+            for pattern_entry in rule["patterns"]:
+                regex: re.Pattern[str] = pattern_entry["regex"]
+                match = regex.search(text)


I think this loses an important part of the ATR model. The upstream rules are field/context scoped, but here every regex is run over one flattened text blob. For example, ATR-2026-00061 is intended for MCP tool args/tool responses, but this implementation will match something benign like Please rotate the database credentials tomorrow. because the word credentials appears anywhere in the selected data. We should preserve the ATR fields/scan target semantics, or at least make the broad free-text behavior explicit and opt-in.

lan17 · 2026-04-26T05:38:02Z

+                        "title": rule["title"],
+                        "severity": rule["severity"],
+                        "category": rule["category"],
+                        "matched_text": match.group()[:200],


This should not include the literal matched secret in result metadata. For credential rules, matched_text can be the AWS key/token/etc. that we just detected, and the SDK observability path copies result metadata into events by default. I'd redact, hash, or omit the matched text by default and only expose a safe snippet/offset if we really need debugging detail.

lan17 · 2026-04-26T05:38:02Z

+        for rule in self._compiled_rules:
+            for pattern_entry in rule["patterns"]:
+                regex: re.Pattern[str] = pattern_entry["regex"]
+                match = regex.search(text)


This can also block the event loop for longer than the evaluator timeout. evaluate() is async, but the work here is CPU-bound synchronous regex scanning, so asyncio.wait_for in the engine cannot interrupt it until this function yields. On my machine, benign no-match input took about 66ms for 10KB, 664ms for 100KB, and 6.6s for 1MB. We need a hard input-length cap and/or a different execution strategy before this is safe on latency-sensitive paths.

lan17 · 2026-04-26T05:38:02Z

+                metadata={
+                    "findings": all_findings,
+                    "count": len(all_findings),
+                    "max_severity": all_findings[0]["severity"] if all_findings else None,


max_severity is currently just the first finding in rules-file order, not the maximum severity. I hit a case where a high-severity rule appeared before a critical credential rule, and the metadata reported max_severity='high'. This should compute the max using the severity ordering, and the legacy single-match fields probably should point at the highest-severity finding too.

lan17 · 2026-04-26T05:38:02Z

+	@echo "  make lint-fix   - run ruff check --fix"
+	@echo "  make typecheck  - run mypy"
+	@echo "  make build      - build package"
+


Since this is adding a new supported contrib package, I think we also need to wire it into the repo-level checks and release path. Right now the root test-extras target only runs Galileo, and the release/build scripts only publish the Galileo contrib evaluator. If ATR should be maintained here, it should have root atr-* targets, be included in contrib CI coverage, and be added to build/release/version metadata.

lan17 · 2026-04-26T06:04:15Z

Stepping back, I think the package/entry-point/test skeleton is a good start, but I’d rework the core evaluator before iterating on smaller fixes.

The main architectural issue is that this currently behaves like a flattened regex scanner, while ATR is really an event/field-aware rule format. I’d preserve ATR fields, scan targets, and condition logic in typed rule models, adapt Agent Control selector output into an explicit ATR event, and only run each condition against its intended field. That should also make it easier to keep metadata safe, cap runtime, and position this as complementary to yelp.detect_secrets rather than overlapping with it.

So my recommendation is: keep the contrib evaluator idea, but redesign the evaluator core around ATR semantics instead of patching the current data -> string -> all regexes pipeline.

codecov · 2026-04-26T06:08:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@lan17

Addresses @lan17's 2026-04-26 architectural review on PR agentcontrol#170. The previous v0.1 evaluator flattened every input into a single string and scanned every rule's regex against it. @lan17's critique was that ATR is an event/field-aware rule format, not a flat regex scanner, and that the v0.1 design also leaked the raw matched text (including any secret that triggered the rule) through `metadata.matched_text`. ## What v0.2 changes * `models.py` (new) — typed `ATREvent`, `ATRRule`, `ATRCondition`, `RuleMatch` dataclasses. `ATR_FIELDS` is the canonical field vocabulary (`content`, `user_input`, `agent_output`, `tool_name`, `tool_args`, `tool_description`, `tool_response`, `agent_message`, `skill_manifest`). `ATR_CATEGORY_DEFAULT_FIELD` maps each ATR category to its default field for the legacy flat-pattern auto-upgrade path. * `ATREvent.from_agent_control_data()` adapts an Agent Control selector input into a typed event with explicit field semantics. Bare strings land in `content`. Dicts map known field names directly. Aliases (`input`/`output`/`text`/`message`) map to canonical fields. Unknown keys are JSON-serialised into `content` so broad content-targeted rules still have something to scan defensively. * `evaluator.py` (rewritten) — runs each rule condition only against the condition's intended field via `event.get_field(condition.field)`. Conditions with no field value short-circuit without compiling or searching. Rule-level `condition: any|all` is honoured. * `redact.py` (new) — Python port of the upstream ATR `redactMatchedValue()` helper (`agent-threat-rules@2.1.2` `src/redact.ts`). Recognises AWS access keys, GitHub PATs / OAuth tokens, Slack tokens, OpenAI / Anthropic API keys, Bearer credentials, JWTs, and PEM private keys. Every match is run through this before reaching `EvaluatorResult.metadata` — the v0.1 `matched_text` field is intentionally REMOVED and replaced with `redacted_excerpt`. * `_wall_clock_search` (new in `evaluator.py`) — bounds per-condition regex evaluation time with `signal.SIGALRM` (default 50 ms) so a pathological pattern cannot block the evaluator pipeline. Configurable via `ATRConfig.condition_budget_ms`. Falls back to a soft wall-clock check on Windows / worker threads. * `evaluator._normalise_rule` accepts both the modern field-aware `conditions: [{field, operator, value, ...}]` shape AND the legacy flat `patterns: [{pattern, description}]` shape. Legacy patterns are auto-upgraded to conditions targeting the category default field. Existing `rules.json` keeps working without regeneration. * `pyproject.toml` — version bump to `0.2.0` (minor: API change on the metadata field, but no plugin discovery shape change). ## Why this addresses each of @lan17's concrete asks | @lan17's review point | Addressed by | |---|---| | "preserve ATR fields, scan targets, and condition logic in typed rule models" | `models.py` ATRRule + ATRCondition + ATR_FIELDS | | "adapt Agent Control selector output into an explicit ATR event" | `ATREvent.from_agent_control_data` | | "only run each condition against its intended field" | `_evaluate_rule` dispatches via `event.get_field(condition.field)` | | "avoid metadata leakage" / "keep metadata safe" | `redact.py`; `metadata.matched_text` removed | | "put a harder bound around regex evaluation latency" | `_wall_clock_search` + `condition_budget_ms` | | "position complementary to yelp.detect_secrets" | The redacted-excerpt output structure makes the evaluator deliberately non-overlapping with secret scanners; rules only match attack patterns, the redactor only renders the match safely | ## Tests `tests/test_evaluator.py` rewritten on 2026-05-11 to match the new architecture: * Field-aware dispatch tests (field isolation, alias mapping, unknown key fallback). * Redacted-excerpt assertions on every match path — verifies no raw AWS key, GitHub PAT, or matched substring ever appears in `EvaluatorResult.metadata`. * Condition budget config validation. * Backwards-compat tests retained for severity / category / block_on_match / on_error policies, with input shapes updated to dicts where field semantics matter. ``` $ pytest evaluators/contrib/atr/tests/test_evaluator.py -v ============================== 31 passed in 0.21s ============================== ```

eeee2345 · 2026-05-11T09:54:36Z

@lan17 — apologies for the slow turnaround on your 2026-04-26 review. Pushed a full rewrite in e272a0a that maps point-for-point to your architectural feedback.

Your review point	Where it is addressed in the rewrite
"preserve ATR fields, scan targets, and condition logic in typed rule models"	New `models.py` — `ATRRule`, `ATRCondition`, `ATR_FIELDS`, `ATR_CATEGORY_DEFAULT_FIELD`
"adapt Agent Control selector output into an explicit ATR event"	New `ATREvent.from_agent_control_data()`
"only run each condition against its intended field"	`_evaluate_rule` dispatches per-condition via `event.get_field(condition.field)`; conditions whose field is empty short-circuit before regex compile/search
"avoid metadata leakage" / "keep metadata safe"	New `redact.py` (Python port of the same helper that ships in `agent-threat-rules@2.1.2` upstream). `metadata.matched_text` from v0.1 is removed and replaced with `metadata.redacted_excerpt`
"put a harder bound around regex evaluation latency"	New `_wall_clock_search` with `signal.SIGALRM` budget (default 50 ms, configurable via `ATRConfig.condition_budget_ms`). Worker-thread / Windows fallback included
"position complementary to `yelp.detect_secrets` rather than overlapping"	The redacted-excerpt output is deliberately non-overlapping with secret scanners — ATR matches attack patterns; if a matched substring happens to contain a secret prefix the redactor classifies and elides it, so this evaluator never enters the secret-scanning role

A few notes on choices I made and would value your input on:

Two on-disk rule shapes accepted. The new evaluator's _normalise_rule honours both the modern conditions: [{field, operator, value, ...}] shape (matching the upstream ATR YAML semantics) AND the legacy patterns: [{pattern, description}] shape that the current rules.json uses. Legacy patterns are auto-upgraded to conditions targeting the category's default field at load time. This let me ship the architectural rewrite without regenerating the rule corpus in this PR. Happy to follow up with a regeneration PR that emits proper per-condition field values from the upstream ATR rule YAMLs if you'd prefer.
Field vocabulary. ATR_FIELDS mirrors the upstream ATR agent_source.type vocabulary plus the per-condition field values seen in real rules. The ATREvent.from_agent_control_data() mapping handles input/output/text/message aliases for Agent Control selectors that haven't standardised on ATR's field names. Open to renaming or dropping any of the nine fields if the Agent Control convention diverges.
Pydantic-validated condition budget. Added condition_budget_ms: int = Field(default=50, ge=1, le=10_000) to ATRConfig. The 50 ms default is intentionally generous — pathological backtracking is the only thing it should fire on.

Tests

$ pytest evaluators/contrib/atr/tests/test_evaluator.py -v
============================== 31 passed in 0.21s ==============================

The test file is rewritten with explicit field-aware inputs (dicts with named fields) so every test exercises field isolation. The credential-leak regression test (test_credential_payload_redacted_in_metadata) asserts that an AWS access key matched against a rule never appears in raw form in EvaluatorResult.metadata — the v0.1 foot-gun is closed at the type level.

Version bump

pyproject.toml v0.1.0 → v0.2.0 (minor — additive new fields on metadata, but the matched_text key is removed, so consumers of v0.1 metadata need to migrate).

Upstream pointers

For reviewers who want to compare the rewrite against the upstream ATR semantics:

ATR rule YAML schema: https://github.com/Agent-Threat-Rule/agent-threat-rules/blob/main/schemas/rule.schema.json
ATR redact helper (TypeScript origin of the Python port): https://github.com/Agent-Threat-Rule/agent-threat-rules/blob/main/src/redact.ts (shipped in v2.1.2, 2026-05-11)

ATR v2.1.2 shipped today with the same redaction helper as part of the standard library, so downstream consumers that import agent-threat-rules directly can use the same function as this evaluator's redact.py. Adjacent context that may be relevant: OWASP Agent-Security-Regression-Harness#74 merged today by @mertsatilmaz with four ATR-derived regression scenarios — happy to layer one of those into a contrib integration test if you'd like an end-to-end Agent Control → ATR → OWASP regression path on this evaluator.

Ready for re-review on this branch when you have time.

eeee2345 added 2 commits April 9, 2026 22:57

lan17 requested changes Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluators): add ATR threat detection evaluator#170

feat(evaluators): add ATR threat detection evaluator#170
eeee2345 wants to merge 3 commits into
agentcontrol:mainfrom
eeee2345:feat/atr-evaluator

eeee2345 commented Apr 9, 2026

Uh oh!

lan17 left a comment

Uh oh!

lan17 Apr 26, 2026

Uh oh!

lan17 Apr 26, 2026

Uh oh!

lan17 Apr 26, 2026

Uh oh!

lan17 Apr 26, 2026

Uh oh!

lan17 Apr 26, 2026

Uh oh!

lan17 commented Apr 26, 2026

Uh oh!

codecov Bot commented Apr 26, 2026

Uh oh!

eeee2345 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eeee2345 commented Apr 9, 2026

Summary

Evaluator details

Config options

Key design decisions

Files

Test plan

Uh oh!

lan17 left a comment

Choose a reason for hiding this comment

Uh oh!

lan17 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lan17 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lan17 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lan17 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lan17 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lan17 commented Apr 26, 2026

Uh oh!

codecov Bot commented Apr 26, 2026

Codecov Report

Uh oh!

eeee2345 commented May 11, 2026

Tests

Version bump

Upstream pointers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants