Skip to content

Python: Enforce approval_mode in Claude and GitHub Copilot agents#5562

Open
eavanvalkenburg wants to merge 2 commits intomicrosoft:mainfrom
eavanvalkenburg:func_app_agents
Open

Python: Enforce approval_mode in Claude and GitHub Copilot agents#5562
eavanvalkenburg wants to merge 2 commits intomicrosoft:mainfrom
eavanvalkenburg:func_app_agents

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

Tools declared with approval_mode="always_require" are silently executed by ClaudeAgent and GitHubCopilotAgent because their SDK-managed tool-calling loops invoke FunctionTool.invoke() directly inside per-package handlers, bypassing the _try_execute_function_calls approval gate that protects the standard chat-client pipeline.

PR #5494 proposed a fix inside FunctionTool.invoke() itself (an _approved=True parameter), but as discussed on that PR any flag on the tool can be spoofed by any code with the same level of access. The security boundary must be the agent that owns the tool-calling loop, not the tool.

Description

Enforce approval_mode="always_require" at the agent boundary for the two affected packages:

  • New per-agent option on_function_approval on ClaudeAgentOptions and GitHubCopilotOptions. The callback receives a FunctionCallContent describing the pending call (name + arguments) and returns bool (sync or async).
  • The existing tool-handler closure each agent supplies to its SDK now gates on func_tool.approval_mode == "always_require" before calling FunctionTool.invoke(). Approval flow:
    • No callback → deny by default with an explanatory tool-error.
    • Callback returns truthy → execute normally.
    • Callback returns falsy → deny with explanatory tool-error.
    • Callback raises → deny safely (logged via logger.exception) so user code crashes don't accidentally allow execution.
  • Deny path returns a result the model can interpret: Claude returns text content describing the denial; Copilot returns ToolResult(result_type="failure", error="approval_denied"). This lets the LLM react gracefully (apologise, try a different path) instead of silently failing.
  • FunctionTool and core APIs are untouched, so the rejected _approved/ToolApprovalRequiredException surface is not introduced.

Coverage

  • packages/claude/tests/test_claude_agent.py — 6 tests in TestClaudeAgentFunctionApproval.
  • packages/github_copilot/tests/test_github_copilot_agent.py — 6 tests in TestGitHubCopilotAgentFunctionApproval.

Each suite exercises: deny-by-default, sync False, sync True, async True, callback-raises → deny, and no-op for never_require tools (with payload-shape assertions on the FunctionCallContent passed to the callback).

Samples

  • samples/02-agents/providers/anthropic/anthropic_claude_with_function_approval.py
  • samples/02-agents/providers/github_copilot/github_copilot_with_function_approval.py

Both demonstrate sync callback, async callback, and the deny-by-default fallback.

Audit of other packages

grep for FunctionTool.invoke(...) call sites across packages/: only core/_agents.py (Agent-as-tool, uses the standard pipeline transitively) and hyperlight/_execute_code_tool.py (internal sandbox tool, not a user-supplied always-require target) remain. CopilotStudio doesn't accept FunctionTool. No other production fixes required.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Tools declared with approval_mode="always_require" were bypassed by the
ClaudeAgent and GitHubCopilotAgent because their SDK-managed tool-calling
loops invoke FunctionTool.invoke() directly via package-supplied handlers,
skipping the standard _try_execute_function_calls approval gate.

Per discussion on microsoft#5494, the fix lives in the agents (not in FunctionTool):
any flag added to the tool itself can be spoofed by code with the same
level of access, so the security boundary is the agent that owns the
tool-calling loop.

- Add on_function_approval option to ClaudeAgentOptions and
  GitHubCopilotOptions. Callback receives a FunctionCallContent describing
  the pending call and returns bool (sync or async).
- Gate FunctionTool.invoke() inside each agent's existing tool-handler
  closure when approval_mode == "always_require". Default policy is deny;
  callbacks that raise also deny safely.
- Deny path returns a tool-error to the model (Claude: text content;
  Copilot: ToolResult(result_type="failure", error="approval_denied"))
  so the LLM can react gracefully instead of silently failing.
- Tests for both agents covering: deny by default, sync False, sync True,
  async True, callback-raises -> deny, no-op for never_require tools.
- Samples demonstrating sync, async, and deny-by-default flows for both
  agents.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 29, 2026 10:37
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 29, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/claude/agent_framework_claude
   _agent.py3052890%385–386, 390, 402, 410–412, 414–415, 445–447, 466, 470, 472, 476, 485, 531, 534, 574, 579–580, 654, 741, 767–770
packages/github_copilot/agent_framework_github_copilot
   _agent.py3111395%45–46, 52, 124, 449, 464–465, 752–753, 791, 794, 861, 886
TOTAL30638355988% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6167 30 💤 0 ❌ 0 🔥 1m 39s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes a security gap in SDK-managed tool-calling loops by enforcing approval_mode="always_require" at the agent boundary for the Python ClaudeAgent and GitHubCopilotAgent, introducing an explicit approval callback and default-deny behavior when approval is required.

Changes:

  • Add on_function_approval callback option to ClaudeAgentOptions and GitHubCopilotOptions, and gate FunctionTool.invoke() behind it for approval_mode="always_require".
  • Add unit tests covering deny-by-default, sync/async approval, callback failure, and no-op behavior for non-approval tools.
  • Add provider samples demonstrating sync/async approval callbacks and default-deny behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/packages/claude/agent_framework_claude/_agent.py Adds on_function_approval option + approval gating in the Claude SDK MCP tool handler.
python/packages/claude/tests/test_claude_agent.py Adds tests validating Claude-side approval enforcement behavior.
python/packages/github_copilot/agent_framework_github_copilot/_agent.py Adds on_function_approval option + approval gating in the Copilot SDK tool handler.
python/packages/github_copilot/tests/test_github_copilot_agent.py Adds tests validating Copilot-side approval enforcement behavior.
python/samples/02-agents/providers/anthropic/anthropic_claude_with_function_approval.py New Claude sample showing function-approval callback usage.
python/samples/02-agents/providers/github_copilot/github_copilot_with_function_approval.py New Copilot sample showing function-approval callback usage.

Comment thread python/packages/github_copilot/agent_framework_github_copilot/_agent.py Outdated
Comment thread python/packages/claude/agent_framework_claude/_agent.py Outdated
Comment thread python/packages/claude/agent_framework_claude/_agent.py
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 86% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by eavanvalkenburg's agents

…override

- _resolve_function_approval no longer collapses {} into None when building
  the FunctionCallContent passed to the callback (Claude + Copilot).
- Claude _apply_runtime_options and Copilot _run_impl/_stream_updates now
  raise ValueError if on_function_approval is supplied via per-run options,
  instead of silently ignoring it. Approval policy must be set at agent
  construction time.
- Drop unnecessary # type: ignore[attr-defined] on Content.name/.arguments
  in samples (Content is a unified class with both attributes defined).
- Add regression tests for the new runtime-options validation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants