Add binary serialization of the parsed AST (JRuby support, step 2)#2999
Open
soutaro wants to merge 1 commit into
Open
Add binary serialization of the parsed AST (JRuby support, step 2)#2999soutaro wants to merge 1 commit into
soutaro wants to merge 1 commit into
Conversation
This is the bridge that lets the WebAssembly build hand a parsed AST back to Ruby without the Ruby C API: the parser serializes the tree to a compact binary buffer, and pure-Ruby code rebuilds the same RBS::AST objects on the other side. This is what will let RBS run on JRuby. Both ends are generated from config.yml, alongside the existing C -> Ruby translation, so they stay in sync: - src/serialize.c (rbs_serialize_node): walks the C AST into the binary format. - lib/rbs/wasm/serialization_schema.rb: the table the decoder follows. - lib/rbs/wasm/deserializer.rb: pure-Ruby decoder, the counterpart of ast_translation.c. Locations go through the public RBS::Location API so it works whether Location is C-backed (CRuby) or pure Ruby (JRuby). The format is documented in docs/wasm_serialization.md. To validate it on CRuby, the extension exposes `_parse_*_to_bytes`, and test/rbs/wasm/serialization_test.rb round-trips the whole bundled RBS corpus (core/stdlib/sig) plus type/method-type batteries, asserting the rebuilt tree is deeply identical to the direct C -> Ruby translation, down to locations and string encodings. Notably, rbs_hash_t does not maintain its `length` field (unlike rbs_node_list_t), so the serializer counts hash entries by walking the list. The template generator now also emits Ruby files (with a Ruby-style header). https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
4ce3226 to
2f4e0c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #2998 (step 1). Review/merge that first; this PR's diff is against it.
Why
For JRuby, the parser runs inside WebAssembly and the AST has to come back to Ruby without the Ruby C API. This PR adds that bridge: the parser serializes the AST to a compact binary buffer, and pure-Ruby code rebuilds the same
RBS::ASTobjects on the other side.Both ends are generated from
config.yml, right alongside the existing C→Ruby translation (ast_translation.c), so they can't drift apart.What's here
src/serialize.c(rbs_serialize_node) — generated C encoder that walks the C AST into the binary format. Self-contained C, so it compiles into both the extension and the.wasm.lib/rbs/wasm/serialization_schema.rb— generated table describing every node.lib/rbs/wasm/deserializer.rb— pure-Ruby decoder, the counterpart ofast_translation.c. Locations are rebuilt through the publicRBS::LocationAPI, so the same decoder works whetherLocationis C-backed (CRuby) or pure Ruby (JRuby, step 3).docs/wasm_serialization.md— the wire format..rbfiles (with a Ruby-style header).How it's validated (on CRuby, no WASM needed)
The extension exposes
_parse_signature_to_bytes/_parse_type_to_bytes/_parse_method_type_to_bytes.test/rbs/wasm/serialization_test.rbparses each input twice — once through the normal C→Ruby translation, once through serialize→deserialize — and asserts the two trees are deeply identical, down to locations and string encodings (stricter than RBS==, which ignores location/comment).Coverage: the entire bundled RBS corpus (
core+stdlib+sig, ~340 files) plus type and method-type batteries. All green locally (4 tests, 389 assertions).This de-risks the hardest part of the JRuby work entirely on CRuby, before any WASM/JVM is involved.
Notes
rbs_hash_tnever updates itslengthfield (unlikerbs_node_list_t, which does inrbs_node_list_append). Nothing currently reads it, so it's harmless today, but the serializer can't trust it — it counts hash entries by walking the list. Worth a separate fix torbs_hash_setif you'd like.steep checkcurrently dies withRBS::DuplicatedMethodDefinitionError: Module#ruby2_keywords— it's defined in bothcore/module.rbs:1701andsteep/patch.rbs:4. That blocks the type-checker from even building its environment here.Next step (#step 3): load
rbs_parser.wasmwith Chicory on JRuby, feed its output to this deserializer, and branchlib/rbs.rbonRUBY_ENGINE.https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
Generated by Claude Code