-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Hey team — I build AIR Blackbox, an open-source EU AI Act compliance scanner for Python AI frameworks. I ran it against 6 major agent frameworks and wanted to share the results for Semantic Kernel.
Semantic Kernel scored second overall and has the strongest human-in-the-loop coverage of any framework we tested — 97 files with HITL patterns, nearly double the next closest.
I'm opening this issue to validate our findings. We recently did the same for Haystack (deepset) and their engineering team's feedback directly improved our scanner. Hoping for the same here.
What the scanner found (highlights)
- 97 files with human-in-the-loop patterns — highest of all 6 frameworks
- 85 files with rate limiting or budget controls — highest of all 6 frameworks
- 41 files with prompt injection defense — highest of all 6 frameworks
- 356/1,242 files have Pydantic or dataclass validation (29%)
- 250/1,242 files use structured logging (20%)
- All 5 OAuth delegation checks pass
- Action-level audit logging in 6 files — strongest of all frameworks
Where I need your help validating
The scanner detected patterns in all 5 OAuth delegation checks. Based on what we learned from the Haystack review, static pattern matching can produce false positives. Specifically:
-
user_idbinding — detected in 3 files. Is this tracking which user authorized agent actions, or is it used for something else? -
Scope/permission validation — detected in 5 files. Are these controlling what agents can access, or are some incidental matches?
-
Execution bounding — detected in 6 files. We look for
max_agent_steps,max_iterations,execution_timeout. Are these real execution boundaries for agents? -
Action audit trail — detected in 6 files. Is Semantic Kernel logging tool invocations and agent actions in production, or only in tests?
-
Action boundaries — detected in 2 files. Are these defining what tools/actions agents are allowed to use?
Full results
24 passing · 11 warnings · 4 failing · 39 total checks
95% automated detection · EU AI Act Articles 9, 10, 11, 12, 14, 15
The 4 failures are missing docs (RISK_ASSESSMENT.md, DATA_GOVERNANCE.md) and no vault configured.
How to reproduce
pip install air-blackbox==1.2.2
air-blackbox comply --scan ./semantic-kernel -vFull reports
- Semantic Kernel report (PDF)
- 6-framework comparison (PDF)
- Haystack validated report (PDF)
- Haystack maintainer feedback: deepset-ai/haystack#10810
Why I'm reaching out
Microsoft's own enterprise AI guidance recommends propagating user identity to agents for every request. Semantic Kernel appears to follow this — 97 HITL files + identity binding + scope validation is significantly ahead of most frameworks.
Before publishing the Semantic Kernel report more broadly, I want to validate the findings with the team that built it. If any pattern matches are false positives, I'd rather fix the scanner.
Appreciate any feedback. The scanner is open source (Apache 2.0) and runs entirely local.