Skip to content

Fix/bound compound count#46953

Open
j7nw4r wants to merge 8 commits into
Azure:mainfrom
j7nw4r:fix/bound-compound-count
Open

Fix/bound compound count#46953
j7nw4r wants to merge 8 commits into
Azure:mainfrom
j7nw4r:fix/bound-compound-count

Conversation

@j7nw4r
Copy link
Copy Markdown
Member

@j7nw4r j7nw4r commented May 18, 2026

Summary

Validate the COUNT field of compound AMQP types (array / list / map) during decode in pyamqp's _decode.py. Without these checks a malicious peer can send an Array32 / List32 / Map32 whose header advertises a multi-billion element count paired with a zero-width or 1-byte element constructor, driving the decoder into an unbounded loop and/or excessive list allocation before the body is even consumed.

Both vendored copies of pyamqp are updated so eventhub and servicebus receive the fix in lockstep:

  • sdk/eventhub/azure-eventhub/azure/eventhub/_pyamqp/_decode.py
  • sdk/servicebus/azure-servicebus/azure/servicebus/_pyamqp/_decode.py

Matching test modules are added under each SDK's tests tree.

Behavior

Two checks are applied before any per-element work:

  1. Hard cap — counts above a well-defined upper bound (far above realistic message-annotation / application-properties sizes) are rejected.
  2. Body-length check — COUNT is rejected if it cannot physically fit in the remaining encoded body, which incidentally rejects the zero-width-element case (e.g. Array32 of null / true / false). This is intentional.

The cap is enforced at every large-variant compound decode site and at the outer decode_frame() list32 path, so a hostile peer cannot bypass it via the control-frame field list.

Commits

  1. Bound AMQP compound element count during decode — applies the two checks to the list / map / array decoders in both vendored copies, plus negative-path coverage:
    • sdk/eventhub/azure-eventhub/tests/pyamqp_tests/unittest/test_decode_bounds.py
    • sdk/servicebus/azure-servicebus/tests/unittests/test_decode_bounds.py
  2. Document AMQP compound count bound and per-decoder rationale — per-decoder docstrings explaining the AMQP 1.0 wire layout context, the two attacker-controlled COUNT shapes the cap defends against, and why zero-width-element arrays are intentionally rejected — so a future maintainer doesn't "fix" the body-length check back into a vulnerability.
  3. Bound list32 element count in decode_frame()decode_frame() decoded the list32 COUNT as a signed int and skipped the _MAX_COMPOUND_COUNT cap, leaving the outer frame field-list path open to the same allocation attack. Switches to c_unsigned_long (matching the AMQP 1.0 wire spec for list32 COUNT) and applies the cap. Addresses Copilot review feedback.
  4. Reject odd-count maps and use integer division_decode_map_small / _decode_map_large used int(raw_count / 2), which silently floored odd counts and left a trailing key with no value, leaking bytes into the next decoder. Now rejects odd counts explicitly and uses integer division. Addresses Copilot review feedback.

Test plan

  • pytest sdk/eventhub/azure-eventhub/tests/pyamqp_tests/unittest/test_decode_bounds.py
  • pytest sdk/servicebus/azure-servicebus/tests/unittests/test_decode_bounds.py
  • No change to any successful decode path; cap is far above legitimate compound sizes observed in Event Hubs / Service Bus traffic

Why duplicate the fix in two trees

_pyamqp is currently vendored separately into eventhub and servicebus rather than shared. Touching both copies in one PR keeps the two decoders from drifting on a security-relevant check. Happy to split into two PRs if Event Hubs and Service Bus owners prefer.

Risk

Low. Strictly additional rejection of malformed frames; well-formed traffic decodes identically.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the internal PyAMQP decoders in azure-servicebus and azure-eventhub by introducing a maximum allowed element count for large AMQP compound types (list32/map32/array32) and adding unit tests to validate the new bounds.

Changes:

  • Added _MAX_COMPOUND_COUNT = 65536 and enforced it in _decode_list_large, _decode_map_large, and _decode_array_large for both Service Bus and Event Hubs.
  • Added new unit tests covering oversized COUNT rejection and boundary acceptance in both packages.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
sdk/servicebus/azure-servicebus/azure/servicebus/_pyamqp/_decode.py Adds _MAX_COMPOUND_COUNT and enforces it for list32/map32/array32 decoding.
sdk/servicebus/azure-servicebus/tests/unittests/test_decode_bounds.py New tests validating oversized COUNT rejection and boundary behavior for decoders.
sdk/eventhub/azure-eventhub/azure/eventhub/_pyamqp/_decode.py Same compound COUNT cap and validations as Service Bus.
sdk/eventhub/azure-eventhub/tests/pyamqp_tests/unittest/test_decode_bounds.py Same decoder bounds tests as Service Bus, for Event Hubs.
Comments suppressed due to low confidence (2)

sdk/servicebus/azure-servicebus/azure/servicebus/_pyamqp/_decode.py:277

  • _decode_map_large claims the pre-halving check "catches hostile odd values", but the code doesn’t validate that the on-wire count is even. For an odd COUNT the current int(raw_count / 2) will silently floor and leave trailing bytes in the buffer, which can corrupt subsequent decoding. Consider rejecting odd counts (e.g., raw_count % 2 != 0) and using integer division (raw_count // 2) to avoid float truncation.
    if raw_count > _MAX_COMPOUND_COUNT:
        raise ValueError(
            f"AMQP map element count {raw_count} exceeds maximum {_MAX_COMPOUND_COUNT}"
        )
    count = int(raw_count / 2)

sdk/eventhub/azure-eventhub/azure/eventhub/_pyamqp/_decode.py:277

  • _decode_map_large claims the pre-halving check "catches hostile odd values", but the code doesn’t validate that the on-wire count is even. For an odd COUNT the current int(raw_count / 2) will silently floor and leave trailing bytes in the buffer, which can corrupt subsequent decoding. Consider rejecting odd counts (e.g., raw_count % 2 != 0) and using integer division (raw_count // 2) to avoid float truncation.
    if raw_count > _MAX_COMPOUND_COUNT:
        raise ValueError(
            f"AMQP map element count {raw_count} exceeds maximum {_MAX_COMPOUND_COUNT}"
        )
    count = int(raw_count / 2)

Comment thread sdk/servicebus/azure-servicebus/azure/servicebus/_pyamqp/_decode.py
Comment thread sdk/eventhub/azure-eventhub/azure/eventhub/_pyamqp/_decode.py
@j7nw4r j7nw4r force-pushed the fix/bound-compound-count branch from 271e834 to c4f0192 Compare May 19, 2026 13:00
@j7nw4r
Copy link
Copy Markdown
Member Author

j7nw4r commented May 20, 2026

The two failing CI checks on this PR are unrelated to the diff and tracked in separate fixes:

Once #46987 and #46988 land, I'll rebase this branch onto main and rerun CI.

j7nw4r added a commit that referenced this pull request May 20, 2026
…#46987)

The test previously measured wall-clock time around real time.sleep(0.8) calls and asserted the duration was strictly less than 1.6 seconds. On slow CI agents (notably macOS hosts) the elapsed time can exceed the budget by a few ms (e.g., 1.6399s observed in PR #46953), causing intermittent failures.

Mock time.sleep in azure.servicebus._base_handler and assert the value passed to sleep instead. This is deterministic and ~80x faster.

Co-authored-by: Johnathan Walker <johwalker@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
j7nw4r and others added 7 commits May 20, 2026 11:16
decode_frame() decoded the list32 COUNT field as a signed int and skipped
the _MAX_COMPOUND_COUNT cap that already protects the per-type
_decode_list_large / _decode_map_large / _decode_array_large sites. A
malicious peer could send a frame advertising a multi-billion field
count, driving `fields = [None] * count` into a multi-gigabyte allocation
before the body is even consumed.

Switch to c_unsigned_long (the AMQP 1.0 wire definition for list32 COUNT)
and apply the existing _MAX_COMPOUND_COUNT cap on the same path. Add
negative-path coverage to both vendored copies.
_decode_map_small and _decode_map_large both used `int(raw_count / 2)`,
which silently floors odd counts and leaves a trailing key with no value.
The half-decoded pair leaks bytes into the next decoder and corrupts
subsequent values on the wire.

Reject odd raw counts explicitly and switch to integer division. The cap
in _decode_map_large already protects against resource exhaustion; this
patch is purely a correctness fix for the small variant and a robustness
fix for the large variant. Negative-path tests added to both vendored
copies.
Resolves 16 mypy errors that surface in CI (e.g., PR Azure#46953) without changing runtime behavior:

- _common.py: widen message_id/content_type/correlation_id setters to Optional[str] so EventData.__init__ can clear them, narrow Optional accesses on annotations/application_properties, and silence one residual update() arg-type with a typed ignore.
- _transport/_base.py, aio/_transport/_base_async.py: keep_alive_interval is Optional[int] (None means 'no keep-alive') and create_source.offset is Optional[Union[int, str, datetime.datetime]] (matches event_position).
- _transport/_pyamqp_transport.py, _uamqp_transport.py, and async equivalents: align keep_alive_interval signatures with the new abstract type to satisfy Liskov.
- _utils.event_position_selector: accept Optional value (already handled at runtime via the else branch).

No source files imported by user-facing API surfaces change behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@j7nw4r j7nw4r force-pushed the fix/bound-compound-count branch from c4f0192 to 466e8de Compare May 20, 2026 16:17
@j7nw4r j7nw4r enabled auto-merge (squash) May 20, 2026 16:40
j7nw4r pushed a commit that referenced this pull request May 21, 2026
Resolves 16 mypy errors that surface in CI (e.g., PR #46953) without changing runtime behavior:

- _common.py: widen message_id/content_type/correlation_id setters to Optional[str] so EventData.__init__ can clear them, narrow Optional accesses on annotations/application_properties, and silence one residual update() arg-type with a typed ignore.
- _transport/_base.py, aio/_transport/_base_async.py: keep_alive_interval is Optional[int] (None means 'no keep-alive') and create_source.offset is Optional[Union[int, str, datetime.datetime]] (matches event_position).
- _transport/_pyamqp_transport.py, _uamqp_transport.py, and async equivalents: align keep_alive_interval signatures with the new abstract type to satisfy Liskov.
- _utils.event_position_selector: accept Optional value (already handled at runtime via the else branch).

No source files imported by user-facing API surfaces change behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants