Skip to content

Add binary serialization of the parsed AST (JRuby support, step 2)#2999

Open
soutaro wants to merge 1 commit into
claude/practical-mendel-hnqws8from
claude/practical-mendel-hnqws8-2
Open

Add binary serialization of the parsed AST (JRuby support, step 2)#2999
soutaro wants to merge 1 commit into
claude/practical-mendel-hnqws8from
claude/practical-mendel-hnqws8-2

Conversation

@soutaro

@soutaro soutaro commented Jun 16, 2026

Copy link
Copy Markdown
Member

Stacked on #2998 (step 1). Review/merge that first; this PR's diff is against it.

Why

For JRuby, the parser runs inside WebAssembly and the AST has to come back to Ruby without the Ruby C API. This PR adds that bridge: the parser serializes the AST to a compact binary buffer, and pure-Ruby code rebuilds the same RBS::AST objects on the other side.

Both ends are generated from config.yml, right alongside the existing C→Ruby translation (ast_translation.c), so they can't drift apart.

What's here

  • src/serialize.c (rbs_serialize_node) — generated C encoder that walks the C AST into the binary format. Self-contained C, so it compiles into both the extension and the .wasm.
  • lib/rbs/wasm/serialization_schema.rb — generated table describing every node.
  • lib/rbs/wasm/deserializer.rb — pure-Ruby decoder, the counterpart of ast_translation.c. Locations are rebuilt through the public RBS::Location API, so the same decoder works whether Location is C-backed (CRuby) or pure Ruby (JRuby, step 3).
  • docs/wasm_serialization.md — the wire format.
  • The template generator now also emits .rb files (with a Ruby-style header).

How it's validated (on CRuby, no WASM needed)

The extension exposes _parse_signature_to_bytes / _parse_type_to_bytes / _parse_method_type_to_bytes. test/rbs/wasm/serialization_test.rb parses each input twice — once through the normal C→Ruby translation, once through serialize→deserialize — and asserts the two trees are deeply identical, down to locations and string encodings (stricter than RBS ==, which ignores location/comment).

Coverage: the entire bundled RBS corpus (core + stdlib + sig, ~340 files) plus type and method-type batteries. All green locally (4 tests, 389 assertions).

This de-risks the hardest part of the JRuby work entirely on CRuby, before any WASM/JVM is involved.

Notes

  • Found a latent bug while doing this: rbs_hash_t never updates its length field (unlike rbs_node_list_t, which does in rbs_node_list_append). Nothing currently reads it, so it's harmless today, but the serializer can't trust it — it counts hash entries by walking the list. Worth a separate fix to rbs_hash_set if you'd like.
  • Heads up, unrelated to this PR: a local steep check currently dies with RBS::DuplicatedMethodDefinitionError: Module#ruby2_keywords — it's defined in both core/module.rbs:1701 and steep/patch.rbs:4. That blocks the type-checker from even building its environment here.

Next step (#step 3): load rbs_parser.wasm with Chicory on JRuby, feed its output to this deserializer, and branch lib/rbs.rb on RUBY_ENGINE.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA


Generated by Claude Code

This is the bridge that lets the WebAssembly build hand a parsed AST back to
Ruby without the Ruby C API: the parser serializes the tree to a compact binary
buffer, and pure-Ruby code rebuilds the same RBS::AST objects on the other side.
This is what will let RBS run on JRuby.

Both ends are generated from config.yml, alongside the existing C -> Ruby
translation, so they stay in sync:

- src/serialize.c (rbs_serialize_node): walks the C AST into the binary format.
- lib/rbs/wasm/serialization_schema.rb: the table the decoder follows.
- lib/rbs/wasm/deserializer.rb: pure-Ruby decoder, the counterpart of
  ast_translation.c. Locations go through the public RBS::Location API so it
  works whether Location is C-backed (CRuby) or pure Ruby (JRuby).

The format is documented in docs/wasm_serialization.md.

To validate it on CRuby, the extension exposes `_parse_*_to_bytes`, and
test/rbs/wasm/serialization_test.rb round-trips the whole bundled RBS corpus
(core/stdlib/sig) plus type/method-type batteries, asserting the rebuilt tree is
deeply identical to the direct C -> Ruby translation, down to locations and
string encodings.

Notably, rbs_hash_t does not maintain its `length` field (unlike
rbs_node_list_t), so the serializer counts hash entries by walking the list.

The template generator now also emits Ruby files (with a Ruby-style header).

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants