Python: Prevent pickle deserialization of untrusted HITL HTTP input#4566
Merged
ahmedmuhsin merged 10 commits intomainfrom Mar 10, 2026
Merged
Python: Prevent pickle deserialization of untrusted HITL HTTP input#4566ahmedmuhsin merged 10 commits intomainfrom
ahmedmuhsin merged 10 commits intomainfrom
Conversation
Add strip_pickle_markers() to sanitize HTTP input before it reaches
pickle.loads() via the checkpoint decoding path. Applied as a 3-layer
defence-in-depth:
1. _app.py: sanitize req.get_json() at the HTTP boundary
2. _workflow.py: sanitize in _deserialize_hitl_response() before decode
3. _serialization.py: sanitize in reconstruct_to_type() as final guard
Any dict containing __pickled__ or __type__ markers from untrusted
sources is replaced with None, blocking arbitrary code execution via
crafted payloads to POST /workflow/respond/{instanceId}/{requestId}.
Includes 12 new unit tests covering the sanitizer and end-to-end
attack prevention.
1. Remove deserialize_value() fallback in _deserialize_hitl_response untrusted HITL data now returns as-is when no type hint is available, never flowing into pickle.loads(). 2. Move strip_pickle_markers() out of reconstruct_to_type() the function is general-purpose again; untrusted-data callers are responsible for sanitizing first (documented with NOTE comment). 3. Define _PICKLE_MARKER/_TYPE_MARKER as local constants with import-time assertions against core's values decouples from private names while failing loudly if core ever changes them. 4. Update tests to reflect new responsibility boundaries.
Member
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Azure Functions HITL (human-in-the-loop) HTTP path against malicious payloads that attempt to trigger pickle.loads() via checkpoint marker keys, by introducing a recursive sanitizer and applying it at key trust boundaries.
Changes:
- Add
strip_pickle_markers()sanitizer to remove checkpoint pickle/type marker dicts from untrusted data structures. - Apply sanitization at the HITL HTTP ingress (
send_hitl_response) and again during HITL response deserialization (_deserialize_hitl_response). - Update/extend unit tests to assert that marker-bearing payloads are blocked/neutralized.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| python/packages/azurefunctions/agent_framework_azurefunctions/_serialization.py | Introduces strip_pickle_markers() and integrates it into reconstruct_to_type. |
| python/packages/azurefunctions/agent_framework_azurefunctions/_workflow.py | Sanitizes HITL responses before type resolution/reconstruction. |
| python/packages/azurefunctions/agent_framework_azurefunctions/_app.py | Sanitizes untrusted HITL HTTP request bodies before raising durable events. |
| python/packages/azurefunctions/tests/test_func_utils.py | Adds/updates security-focused tests validating marker stripping behavior. |
Comments suppressed due to low confidence (1)
python/packages/azurefunctions/agent_framework_azurefunctions/_app.py:528
- If
strip_pickle_markers()detects markers it returnsNone, but the endpoint still raises the external event withevent_data=Noneand returns 200 "Response delivered successfully". This silently converts a (potentially malicious or malformed) response intonulland may cause confusing workflow behavior. Consider rejecting the request (e.g., 400/422) when markers are detected and avoid callingraise_eventin that case so clients get a clear failure signal.
# Sanitize untrusted HTTP input before it reaches pickle.loads().
# See strip_pickle_markers() docstring for details on the attack vector.
response_data = strip_pickle_markers(response_data)
# Send the response as an external event
# The request_id is used as the event name for correlation
await client.raise_event(
instance_id=instance_id,
event_name=request_id,
event_data=response_data,
)
python/packages/azurefunctions/agent_framework_azurefunctions/_workflow.py
Show resolved
Hide resolved
python/packages/azurefunctions/agent_framework_azurefunctions/_serialization.py
Show resolved
Hide resolved
python/packages/azurefunctions/agent_framework_azurefunctions/_serialization.py
Outdated
Show resolved
Hide resolved
python/packages/azurefunctions/agent_framework_azurefunctions/_serialization.py
Outdated
Show resolved
Hide resolved
- Use cast() for dict/list comprehensions in strip_pickle_markers (pyright) - type: ignore for narrowed dict return in _workflow.py (pyright) - Simplify marker imports: use core constants directly, remove local copies - Remove duplicate pyright ignore comment
larohra
approved these changes
Mar 9, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
python/packages/azurefunctions/agent_framework_azurefunctions/_app.py:528
- If
strip_pickle_markers()strips the top-level payload (returnsNone), this endpoint still raises the external event and returns 200 "Response delivered successfully". That’s misleading to the caller and may cause confusing workflow behavior (handlers receivingNonedue to sanitization). Consider rejecting the request (e.g., 400/422) when sanitization removes the entire response payload, with an explicit error message, instead of silently deliveringNone.
# Sanitize untrusted HTTP input before it reaches pickle.loads().
# See strip_pickle_markers() docstring for details on the attack vector.
response_data = strip_pickle_markers(response_data)
# Send the response as an external event
# The request_id is used as the event name for correlation
await client.raise_event(
instance_id=instance_id,
event_name=request_id,
event_data=response_data,
)
dmytrostruk
approved these changes
Mar 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Introduces
strip_pickle_markers()— a recursive sanitizer that replaces any dict containing__pickled__or__type__checkpoint marker keys withNone, neutralizing the attack vector before data can reachpickle.loads().Applied at two trust boundaries:
HTTP boundary (_app.py): Sanitizes
req.get_json()immediately after parsing, before the data entersraise_event(). The durable event payload is stored clean.HITL deserializer (_workflow.py): Sanitizes in
_deserialize_hitl_response()as a second barrier before any type reconstruction.Additional hardening:
Removed unsafe fallback: The
deserialize_value()fallback in_deserialize_hitl_responsehas been replaced — when no type hint is available, the sanitized dict is returned as-is rather than flowing into the pickle-capable decoder.Clean separation of trust levels:
reconstruct_to_type()remains general-purpose (works with both trusted checkpoint data and pre-sanitized untrusted data), with a documentation note that untrusted-data callers must sanitize first.Decoupled marker constants:
_PICKLE_MARKER/_TYPE_MARKERare defined as local constants with import-time assertions against core's values, avoiding silent breakage if core renames them.Contribution Checklist