Skip to content

EU AI Act compliance scan — Semantic Kernel has the strongest human oversight of any framework we tested #13657

@shotwellj

Description

@shotwellj

Hey team — I build AIR Blackbox, an open-source EU AI Act compliance scanner for Python AI frameworks. I ran it against 6 major agent frameworks and wanted to share the results for Semantic Kernel.

Semantic Kernel scored second overall and has the strongest human-in-the-loop coverage of any framework we tested — 97 files with HITL patterns, nearly double the next closest.

I'm opening this issue to validate our findings. We recently did the same for Haystack (deepset) and their engineering team's feedback directly improved our scanner. Hoping for the same here.

What the scanner found (highlights)

  • 97 files with human-in-the-loop patterns — highest of all 6 frameworks
  • 85 files with rate limiting or budget controls — highest of all 6 frameworks
  • 41 files with prompt injection defense — highest of all 6 frameworks
  • 356/1,242 files have Pydantic or dataclass validation (29%)
  • 250/1,242 files use structured logging (20%)
  • All 5 OAuth delegation checks pass
  • Action-level audit logging in 6 files — strongest of all frameworks

Where I need your help validating

The scanner detected patterns in all 5 OAuth delegation checks. Based on what we learned from the Haystack review, static pattern matching can produce false positives. Specifically:

  1. user_id binding — detected in 3 files. Is this tracking which user authorized agent actions, or is it used for something else?

  2. Scope/permission validation — detected in 5 files. Are these controlling what agents can access, or are some incidental matches?

  3. Execution bounding — detected in 6 files. We look for max_agent_steps, max_iterations, execution_timeout. Are these real execution boundaries for agents?

  4. Action audit trail — detected in 6 files. Is Semantic Kernel logging tool invocations and agent actions in production, or only in tests?

  5. Action boundaries — detected in 2 files. Are these defining what tools/actions agents are allowed to use?

Full results

24 passing · 11 warnings · 4 failing · 39 total checks
95% automated detection · EU AI Act Articles 9, 10, 11, 12, 14, 15

The 4 failures are missing docs (RISK_ASSESSMENT.md, DATA_GOVERNANCE.md) and no vault configured.

How to reproduce

pip install air-blackbox==1.2.2
air-blackbox comply --scan ./semantic-kernel -v

Full reports

Why I'm reaching out

Microsoft's own enterprise AI guidance recommends propagating user identity to agents for every request. Semantic Kernel appears to follow this — 97 HITL files + identity binding + scope validation is significantly ahead of most frameworks.

Before publishing the Semantic Kernel report more broadly, I want to validate the findings with the team that built it. If any pattern matches are false positives, I'd rather fix the scanner.

Appreciate any feedback. The scanner is open source (Apache 2.0) and runs entirely local.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions