Skip to content

Run the RBS parser on JRuby via WebAssembly (JRuby support, step 3)#3000

Open
soutaro wants to merge 6 commits into
claude/practical-mendel-hnqws8-2from
claude/practical-mendel-hnqws8-3
Open

Run the RBS parser on JRuby via WebAssembly (JRuby support, step 3)#3000
soutaro wants to merge 6 commits into
claude/practical-mendel-hnqws8-2from
claude/practical-mendel-hnqws8-3

Conversation

@soutaro

@soutaro soutaro commented Jun 16, 2026

Copy link
Copy Markdown
Member

Stacked on #2999 (step 2), which is stacked on #2998 (step 1). Review/merge those first; this PR's diff is against #2999.

This is the payoff: RBS runs on JRuby, with the parser in WebAssembly and the AST rebuilt in pure Ruby. lib/rbs.rb branches on RUBY_ENGINE, so CRuby is completely unaffected.

JRuby ── Chicory (pure-JVM Wasm) ── rbs_parser.wasm ── serialize ─┐
                                                                  │ identical bytes
RBS::WASM::Deserializer ◄── error blob / serialized AST ◄─────────┘

What's here

WebAssembly ABI (wasm/rbs_wasm.c): rbs_wasm_parse_signature / _parse_type / _parse_method_type parse a character range of a source buffer and leave the serialized AST — or, on a parse error, an error blob (positions + token type + message) — in linear memory, read back via rbs_wasm_result_ptr/_len.

Ruby side (lib/rbs/wasm, loaded only on JRuby):

  • Runtime — loads rbs_parser.wasm into Chicory, wires up WASI, and drives the parse functions. Chicory is pure Java, so there's no native dependency — only the .wasm and the jars ship.
  • Parser — implements RBS::Parser._parse_signature/_parse_type/_parse_method_type on top of Runtime + the step-2 Deserializer, raising RBS::ParsingError just like the C extension. (_lex, _parse_type_params, and the inline-annotation entries raise NotImplementedError for now — follow-up work.)
  • Location — a pure-Ruby implementation of the primitives behind RBS::Location (the C extension's legacy_location.c), so rbs/location_aux.rb works unchanged.

Packaging & CI:

  • rake wasm:jruby_setup assembles lib/rbs/wasm/ (the .wasm plus the Chicory jars from Maven Central). The gemspec ships these in the java-platform gem and skips the C extension there.
  • A JRuby CI job assembles the runtime and runs test/rbs/wasm/jruby_parser_test.rb (which parses the whole bundled corpus + checks types, method types, variables, and error handling).

Validation

Verified locally with JRuby 10.0.6.0 + Chicory 1.7.5 that JRuby and CRuby produce byte-identical ASTs across the entire bundled corpus: a SHA-256 over the JSON of every declaration in core + stdlib + sig (342 files) matches exactly between the two engines. parse_type/parse_method_type (including type variables) and ParsingError reporting also verified.

Scope / follow-ups

  • _lex, _parse_type_params, and inline Ruby annotations aren't wired through the WASM ABI yet — they raise a clear NotImplementedError on JRuby. The main signature-loading path is complete.
  • Encoding is fixed to UTF-8 in the WASM parser (fine for UTF-8/ASCII RBS, which is the convention).

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA


Generated by Claude Code

claude added 6 commits June 16, 2026 15:09
JRuby cannot load the MRI C extension, so on JRuby RBS now runs the parser
inside WebAssembly (Chicory, a pure-Java runtime) and rebuilds the AST in
pure Ruby. `lib/rbs.rb` branches on RUBY_ENGINE.

WebAssembly ABI (wasm/rbs_wasm.c):
- rbs_wasm_parse_signature / _parse_type / _parse_method_type parse a
  character range of a source buffer and leave the serialized AST (or, on a
  parse error, an error blob) in linear memory for the host to read via
  rbs_wasm_result_ptr / _len.

Ruby side (lib/rbs/wasm, loaded only on JRuby):
- Runtime loads rbs_parser.wasm into Chicory, wires up WASI, and drives the
  parse functions.
- Parser implements RBS::Parser._parse_signature/_parse_type/
  _parse_method_type on top of the runtime and RBS::WASM::Deserializer,
  raising RBS::ParsingError on failure just like the C extension. _lex,
  _parse_type_params and the inline annotation entries are not supported yet.
- Location is a pure-Ruby implementation of the primitives behind
  RBS::Location (the C extension's legacy_location.c), so rbs/location_aux.rb
  works unchanged.

Packaging and CI:
- `rake wasm:jruby_setup` assembles lib/rbs/wasm/ (the .wasm plus the Chicory
  jars from Maven Central); the gemspec ships them in the `java` platform gem
  and skips the C extension there.
- A JRuby CI job parses the whole bundled corpus and runs
  test/rbs/wasm/jruby_parser_test.rb.

Verified that JRuby and CRuby produce byte-identical ASTs across the entire
bundled corpus (core + stdlib + sig).

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
Add `omit_on_jruby!` (class- and instance-level), mirroring
`omit_on_truffle_ruby!`, for tests that depend on the C extension or on parser
features not yet wired through the WebAssembly bridge.

- parser_test: omit `test__lex` and `test_parse_type_params` on JRuby (those
  primitives raise NotImplementedError there for now).
- serialization_test: omit the class on JRuby; its round-trip is driven by the
  C extension's `_parse_*_to_bytes`.
- jruby_parser_test: qualify Test::Unit as ::Test::Unit so the file also loads
  under the full suite, where RBS::Test would otherwise shadow it.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
The WASM parser was hardcoded to UTF-8. Pass the buffer's Ruby encoding name
through the ABI and resolve it with rbs_encoding_find (falling back to UTF-8),
so non-UTF-8 sources (EUC-JP, Windows-31J, ...) lex correctly — matching the C
extension, which uses the buffer's encoding.

Verified that an EUC-JP signature produces byte-identical locations and a
correctly-encoded comment string on JRuby and CRuby, and that the UTF-8 corpus
digest is unchanged.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
parser_test surfaced two behaviors the WebAssembly path didn't match:

- A reversed `byte_range` (e.g. 1..0) made the lexer loop forever inside
  WebAssembly, hanging the host. RBS::Parser now validates the position range
  (matching validate_position_range in the C extension) and raises
  ArgumentError, and the wasm shim guards against invalid ranges so a stray
  caller can never wedge the VM.
- `variables:` that is not nil or an Array of Symbols now raises TypeError,
  matching declare_type_variables in the C extension (it used to raise
  NoMethodError).

With these, test/rbs/parser_test.rb passes on JRuby (only `_lex` and
`parse_type_params`, which aren't wired through the bridge yet, are omitted), so
the JRuby CI job now runs it too.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
Drive the module through Chicory's AOT compiler (wasm -> JVM bytecode) instead
of the interpreter. On the bundled corpus this cuts a full parse pass from
~18.2s to ~2.1s (about 9x), while producing byte-identical ASTs.

- runtime.rb uses MachineFactoryCompiler when its jars are present and falls
  back to the interpreter otherwise (so the parser still works with only the
  base Chicory jars).
- rake wasm:vendor_jars now also fetches the Chicory `compiler` jar and the ow2
  ASM libraries it needs (pinned via ASM_VERSION), bundled into the JRuby gem.

Compilation happens at runtime, once per process (~0.4s), against whichever
Chicory runtime is loaded, so the compiler and runtime versions can never drift
apart.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
…parser

These were the last RBS::Parser primitives still raising NotImplementedError on
JRuby. With them in place, test/rbs/parser_test.rb passes on JRuby with no
omissions (42 tests, same as CRuby).

- serialize.c gains rbs_serialize_node_list for the bare node list that
  parse_type_params returns; rbs_wasm.c adds parse_type_params, the two inline
  annotation entries, and lex (a countless stream of [type, start, end] records
  read until exhausted).
- The deserializer gains deserialize_node_list and deserialize_tokens; the
  parser implements _parse_type_params, _lex and the inline annotation methods
  on top of them, with the same range/variable validation as the others.
- The omit_on_jruby! markers on test__lex and test_parse_type_params are gone,
  and jruby_parser_test covers lex and type params directly.

Also harden the AOT fallback: machine_factory now rescues LinkageError too, so
an incompatible ASM jar set degrades to the interpreter instead of crashing.

https://claude.ai/code/session_01LTveMt3NLbYHEboXuzAKpA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants