Skip to content

Python: Fix function_approval_response extraction in AG-UI workflow path#4550

Open
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4546-1
Open

Python: Fix function_approval_response extraction in AG-UI workflow path#4550
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4546-1

Conversation

@moonbox3
Copy link
Contributor

@moonbox3 moonbox3 commented Mar 9, 2026

Motivation and Context

The _extract_responses_from_messages helper only handled function_result content, silently dropping function_approval_response entries. This meant approval/denial decisions from the user were never forwarded to the agent runtime, causing human-in-the-loop tool-approval workflows to hang indefinitely.

Fixes #4546

Description

The root cause was a guard clause in _extract_responses_from_messages that skipped any content whose type was not function_result. The fix restructures the conditional to also handle function_approval_response content, extracting its approved flag, id, and associated function_call into the responses dict keyed by the approval id. Tests covering approved, denied, mixed, and edge-case scenarios are added to prevent regression.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

moonbox3 and others added 2 commits March 9, 2026 12:31
…4546)

_extract_responses_from_messages now handles function_approval_response
content in addition to function_result content. Previously, approval
responses sent via the messages field were silently dropped because the
function only checked for content.type == "function_result".

The approval response is keyed by content.id and includes the approved
status, id, and serialized function_call — consistent with how
_coerce_content identifies approval response payloads.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 9, 2026 03:35
@moonbox3 moonbox3 self-assigned this Mar 9, 2026
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by moonbox3's agents

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 9, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _workflow_run.py5072794%180, 216–219, 247, 252, 280, 290, 301, 306, 309, 322, 362–363, 384, 433, 453, 461, 464, 469, 484, 557, 599–600, 671, 714
TOTAL22686258288% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4948 20 💤 0 ❌ 0 🔥 1m 24s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes AG-UI workflow runs hanging in human-in-the-loop scenarios by ensuring user-sent tool approval decisions are extracted from incoming messages and forwarded into the workflow responses map.

Changes:

  • Extend _extract_responses_from_messages to also extract function_approval_response content (in addition to function_result).
  • Add unit tests covering approval extraction (approved/denied/mixed/ignored content).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_workflow_run.py Extracts function_approval_response payloads from incoming messages so workflow runs can resume from message-based approvals.
python/packages/ag-ui/tests/ag_ui/test_workflow_run.py Adds unit tests for the updated extraction helper, covering approval response handling.

You can also share your feedback on Copilot code review. Take the survey.

…ssage-based approvals

- Update _extract_responses_from_messages docstring to reflect that it
  now handles function_approval_response content in addition to
  function_result content.
- Add integration tests for run_workflow_stream across two turns with
  approval responses provided via messages (function_approvals) rather
  than resume.interrupts, covering both approved and denied scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 82%

✓ Correctness

This diff adds a docstring clarification and two new integration tests for approval-via-messages functionality. The tests are well-structured and symmetric (approved vs denied). No blocking correctness issues found. The tests correctly match request IDs, use the right async patterns (no @pytest.mark.asyncio needed per project convention), and verify both the happy path and denial path. One minor concern: the "interrupt" not in resumed_finished assertion may be fragile depending on how the Pydantic model serializes optional fields.

✓ Security Reliability

This diff contains only a docstring clarification in production code and two new integration tests. There are no production logic changes, no new trust boundaries, no deserialization of untrusted input, no secrets, and no resource-leak risks. The tests are well-structured and exercise both the approved and denied approval paths. No security or reliability issues found.

✓ Test Coverage

Two solid integration tests are added covering both approved and denied approval-via-messages paths. The assertions are meaningful: they verify event types, text output content, and absence of leftover interrupts. However, the docstring update to _extract_responses_from_messages describes new function_approval_response handling, yet no unit test is added to the existing TestExtractResponsesFromMessages class to exercise that extraction logic directly. The integration tests cover the end-to-end flow but a targeted unit test would catch regressions in the extraction function more precisely.

✓ Design Approach

The diff adds a docstring clarification and two new integration tests covering the approval-via-messages path (approved and denied). The production code change is documentation-only, and the tests exercise a real workflow end-to-end. There are no fundamental design approach concerns: the tests verify a legitimate second message-delivery channel for approval responses and cover both positive and negative outcomes. The minor duplication between the two test functions is acceptable given they assert different behavioral branches.

Suggestions

  • The assertion assert "interrupt" not in resumed_finished (lines 1328 and 1399) could be fragile: if the Pydantic model always includes interrupt in model_dump() (e.g., as None or []), the key would still be present and the assertion would fail. Consider assert not resumed_finished.get("interrupt") which handles both absent keys and falsy values (None, []).
  • Add a unit test to TestExtractResponsesFromMessages that constructs a Message containing function_approval_response content and asserts _extract_responses_from_messages returns the expected {id: content} mapping. This directly validates the extraction logic described in the updated docstring, independent of the full workflow machinery.
  • Consider adding an edge-case integration test (or unit test) where both a function_result and a function_approval_response are present in the same message, verifying both are extracted into the responses dict without interference.

Automated review by moonbox3's agents

- Use safer 'not .get("interrupt")' assertion instead of 'not in'
  to handle Pydantic v2 model_dump() including keys with None values
- Add unit test for mixed function_result and function_approval_response
  in the same message to TestExtractResponsesFromMessages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 86%

✓ Correctness

The diff adds a straightforward new test for mixed result/approval extraction and weakens two assertions from checking key absence ("interrupt" not in dict) to checking falsiness (not dict.get("interrupt")). The new test looks correct. The assertion changes are likely motivated by Pydantic's model_dump() including "interrupt": None for optional fields, but using not is overly permissive—it would also pass for unexpected falsy values like {}, [], or False, potentially masking real bugs.

✓ Security Reliability

This diff modifies only test code: it adds a new test for mixed result/approval extraction and changes two assertions from assert "interrupt" not in dict to assert not dict.get("interrupt"). No production code is touched. The assertion change subtly weakens the check—dict.get("interrupt") returns None if the key is absent but would also pass if the key is present with a falsy value (e.g., {}, 0, False, None, ""), whereas the original "interrupt" not in dict strictly checks key absence. This is unlikely to mask a real bug in practice but is worth noting as a minor reliability concern in tests. No security issues found.

✓ Test Coverage

The diff adds a useful test for mixed result and approval content in the same message, and tweaks two assertions from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt"). The new test is well-structured with meaningful assertions. The assertion change subtly weakens the check: the original verified the key was absent, while .get() would also pass if the key is present but holds a falsy value (e.g., None, False, {}, [], 0). If the intent is truly to verify the field is absent or null, consider an explicit check. No missing test scenarios were identified.

✓ Design Approach

This diff adds a well-structured test for mixed result/approval extraction and relaxes two assertions from key-absence checks to falsy-value checks. The changes are minor and appropriate. The assertion change from "interrupt" not in dict to not dict.get("interrupt") makes tests less brittle against Pydantic serialization behavior (whether optional None fields are included or excluded in model_dump()), which is a reasonable improvement rather than a design concern.

Suggestions

  • Consider using assert resumed_finished.get("interrupt") is None instead of assert not resumed_finished.get("interrupt"). The is None check precisely allows both key-absent and key-is-None cases while still catching unexpected falsy values like {} or False.
  • The assertion change from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt") is slightly weaker: it now also passes when interrupt is present but falsy (e.g., {}, None, 0). If the intent is to verify the key is truly absent, the original form is more precise. If the intent is to accept both absence and falsy values, the new form is fine—just be aware of the semantic difference.
  • The assertion change from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt") is semantically weaker — it now passes when interrupt is present but falsy (e.g., None, {}, 0). If the intent is to accept both absent and None, this is fine; if the field should truly be absent, the original assertion was stricter. Consider documenting the intent or using assert resumed_finished.get("interrupt") is None for clarity.

Automated review by moonbox3's agents

@moonbox3 moonbox3 requested a review from eavanvalkenburg March 9, 2026 11:15
@moonbox3 moonbox3 enabled auto-merge March 9, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: AG-UI workflow path silently drops function_approval_response from messages

4 participants