Skip to content

ProtoParser fidelity PR 1: split leading/trailing comments, fix blank-doc-line rendering#3579

Draft
efirestone wants to merge 1 commit intomasterfrom
efirestone/proto-parser-fidelity
Draft

ProtoParser fidelity PR 1: split leading/trailing comments, fix blank-doc-line rendering#3579
efirestone wants to merge 1 commit intomasterfrom
efirestone/proto-parser-fidelity

Conversation

@efirestone
Copy link
Copy Markdown
Collaborator

@efirestone efirestone commented Apr 23, 2026

Summary

Today, if you go .proto file -> ProtoParser -> print out .proto file the resulting file isn't identical to the original because some metadata isn't preserved within ProtoParser. The goal of this and follow-on PRs is to add that metadata into ProtoParser so that we can faithfully reproduce the original file.

This is the first of a 4-part series. This PR covers the two fidelity gaps that share the same code path:

  • Issue 1 — leading vs. trailing comment split. FieldElement, EnumConstantElement, and ReservedElement gain a new trailingDocumentation: String = "" field. The parser populates it, and toSchema() emits same-line trailing comments inline ( // ..., or /* ... */ for multi-line block trailing) instead of merging them into the leading block above the declaration.
  • Issue 6 — blank line inside a doc paragraph. Util.appendDocumentation now emits //\n instead of // \n for empty lines, eliminating the trailing-whitespace artifact.

API shape

Additive and non-breaking. The existing documentation: String field keeps its current merged semantics ("$leading\n$trailing" when both are present), so consumers that only read documentation see no change. Parser-constructed elements always populate the new field; elements constructed by other means default to "".

Plumbing:

  • SyntaxReader.tryAppendTrailingDocumentation now returns a small DocumentationWithTrailing(merged, trailing) value (and is internal, since the new return type is internal).
  • Shared helpers leadingDocumentation(...) and appendTrailingDocumentation(...) in Util.kt are reused across the three elements to keep printer logic uniform.

Test plan

  • ./gradlew :wire-schema:jvmTest — all pass (including new unit + round-trip tests)
  • Core subproject suite (:wire-compiler, :wire-java-generator, :wire-kotlin-generator, :wire-swift-generator, :wire-runtime, :wire-golden-files, :wire-gradle-plugin, :wire-moshi-adapter, :wire-gson-support, :wire-reflector) — all pass, no golden file drift
  • Back-compat: existing assertions on documentation (messageFieldLeadingAndTrailingCommentAreCombined, trailingCommentNotAssignedToFollowingField, trailingCommentNotCombinedWhenEmpty) remain green
  • New tests cover: leading-only, trailing-only, leading+trailing, multi-line block trailing, trailing inside indented context, and the blank-doc-line fix

Out of scope (follow-up PRs in the series)

  • Issue 2: floating / section comments absorbed into the next declaration
  • Issue 3: nested-type reordering
  • Issue 4: blank-line layout between declarations
  • Issue 5: banner comment prepended on every re-emit

Commits

10 commits, each small and self-contained. Reviewable top-to-bottom.

🤖 Generated with Claude Code

Adds round-trip fidelity for two ProtoParser gaps that share the same
code path:

1. Leading vs. trailing comment split. FieldElement, EnumConstantElement,
   and ReservedElement gain a trailingDocumentation: String = "" field.
   The parser populates it; toSchema() emits same-line trailing comments
   inline (// ... for single-line, /* ... */ for multi-line block
   trailing) instead of merging them into the leading block above the
   declaration.

2. Blank line inside a doc paragraph. Util.appendDocumentation now emits
   "//\n" instead of "// \n", eliminating the trailing-whitespace
   artifact.

The API is additive and non-breaking: the existing documentation field
keeps its current merged semantics ("$leading\n$trailing" when both are
present), so consumers that only read documentation see no change.
Parser-constructed elements always populate the new field; elements
constructed by other means default to "".

SyntaxReader.tryAppendTrailingDocumentation now returns a small
DocumentationWithTrailing(merged, trailing) value. Shared helpers
leadingDocumentation(...) and appendTrailingDocumentation(...) in
Util.kt keep the printer logic uniform across the three elements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@efirestone efirestone force-pushed the efirestone/proto-parser-fidelity branch from 5306de5 to 814c268 Compare April 23, 2026 18:17
@oldergod
Copy link
Copy Markdown
Member

What's the use case?

@efirestone
Copy link
Copy Markdown
Collaborator Author

efirestone commented Apr 23, 2026

What's the use case?

We want to be able to programmatically strip out unused fields/types on a .proto file which can then later be converted to whatever codegen'd language we want. It's similar to the existing allow/denylist behavior, but the result is another copy of the .proto file rather than Kotlin/Java/Swift.

We can mostly do it today, but the comments, whitespace, and ordering don't come through faithfully, so the goal is to preserve those accurately.

@oldergod
Copy link
Copy Markdown
Member

oldergod commented Apr 24, 2026

Treeshaking at the proto level to generate another proto schema is already heavily used by all modules/services on the backend of Cash App. Everytime a wire {} has protoLibrary = true, Wire will rewrite the proto files.

Since we currently consider the way generated code is formatted not part of the API, I'm curious as to why we would wanna enforce that to .proto files. Seems like a lot of work for almost zero gain, almost a constraint instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants