Skip to content

Fix large field number parsing by using unsigned right shift#3509

Open
xi0yu wants to merge 3 commits intosquare:masterfrom
xi0yu:fix-large-field-numbers
Open

Fix large field number parsing by using unsigned right shift#3509
xi0yu wants to merge 3 commits intosquare:masterfrom
xi0yu:fix-large-field-numbers

Conversation

@xi0yu
Copy link

@xi0yu xi0yu commented Feb 3, 2026

This fixes an issue where Wire fails to parse protobuf data when encountering very large field numbers (for example, 290,848,974).
The failure is caused by using a signed right shift (shr), which introduces sign extension for large values, leading to incorrect tag extraction.
Changed shr to ushr in nextTag() and skipGroup() methods to ensure proper handling of large field numbers.

Only two lines modified:

  • Line 184: tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
  • Line 250: val tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> val tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)

This fixes an issue where Wire fails to parse protobuf data with large field numbers
(greater than 2^29) due to incorrect signed right shift operations that cause
sign extension. Changed shr to ushr in nextTag() and skipGroup() methods to ensure
proper handling of large field numbers.

Only two lines modified:
- Line 184: tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
- Line 250: val tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> val tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
@oldergod
Copy link
Member

oldergod commented Feb 3, 2026

Thanks for the PR. Is writing a test for this gonna be difficult?

- Change signed right shift to unsigned right shift in ProtoReader
  to fix parsing of large field numbers (numbers greater than 0x10000000) :)
- Add test case and fixture for large field number validation
@xi0yu
Copy link
Author

xi0yu commented Feb 4, 2026

Thanks! I’ve confirmed locally that the issue occurs when field numbers are >= 0x10000000. Using signed right shift (shr) causes sign extension, which makes tag/tagAndField decode incorrectly. This is exactly what the PR fixes by switching to unsigned right shift (ushr).

I’ve also added a unit test specifically covering field numbers >= 0x10000000 to verify the fix.

@oldergod
Copy link
Member

oldergod commented Feb 4, 2026

Comment on lines 26 to 31
private val adapter = createRuntimeMessageAdapter(
LargeFieldMessage::class.java,
"square.github.io/wire/unknown",
Syntax.PROTO_2,
LargeFieldNumberTest::class.java.classLoader,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could access the generated adapter as well, I think?

Suggested change
private val adapter = createRuntimeMessageAdapter(
LargeFieldMessage::class.java,
"square.github.io/wire/unknown",
Syntax.PROTO_2,
LargeFieldNumberTest::class.java.classLoader,
)
private val adapter = LargeFieldMessage.ADAPTER

@xi0yu
Copy link
Author

xi0yu commented Feb 4, 2026

It's done! Thanks for the opportunity. I'm really happy to contribute to this project. :P

refactor: use direct adapter reference instead of factory 
@oldergod
Copy link
Member

oldergod commented Feb 4, 2026

Hmm, is this actually a problem? It looks like this PR addresses a tag value which is outside of the supported bound for the wire format anyway?

https://github.com/protocolbuffers/protobuf/blob/6fcc1b6d16db029c219083042fd9e4238d32faf3/src/google/protobuf/edition_unittest.proto#L615-L618

@xi0yu
Copy link
Author

xi0yu commented Feb 5, 2026

Hmm, is this actually a problem? It looks like this PR addresses a tag value which is outside of the supported bound for the wire format anyway?

https://github.com/protocolbuffers/protobuf/blob/6fcc1b6d16db029c219083042fd9e4238d32faf3/src/google/protobuf/edition_unittest.proto#L615-L618

Thanks for raising the important question about large field number support. You noticed the Google documentation limit comments, which prompted me to investigate deeply.

I found some key points:

  1. Calculation error in Google's comment: The comment says "The largest possible tag number is 2^28 - 1, since the wire format uses three bits to communicate wire type" - the basic principle is correct (reserving three bits for wire type), but the calculated value is wrong. Actually, in 32 bits, 3 bits are reserved for wire type, leaving 29 bits for field numbers, so the maximum should be 0x1FFFFFFF (i.e., 2^29-1 = 536870911), not 2^28-1 (268435455).

  2. Legitimacy of field number 290848974: This field number (hexadecimal: 0x115600CE) falls within the valid range of 2^29-1, making it a legitimate Protobuf field number.

  3. Hard limit verification: I confirmed that field numbers exceeding 2^29-1 (such as 2^29 = 536870912) are rejected by the compiler, confirming that 2^29-1 (536870911) is the true hard limit. Actually, the Wire project itself defines the same limit in Util.kt#L88: MAX_TAG_VALUE = (1 shl 29) - 1 // 536,870,911

So your question is very valuable, it helped me discover that Google's own comments contain calculation errors. The actual limit is 2^29-1, not 2^28-1, so our fix is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants