Skip to content

Implement Chicory WASM backend for the Ruby API#4017

Draft
headius wants to merge 3 commits intoruby:mainfrom
jruby:wasm_gem
Draft

Implement Chicory WASM backend for the Ruby API#4017
headius wants to merge 3 commits intoruby:mainfrom
jruby:wasm_gem

Conversation

@headius
Copy link
Copy Markdown
Contributor

@headius headius commented Mar 19, 2026

This PR will provide a new Ruby API backend (in addition to the C extension and the FFI wrapper) based on the Chicory WASM runtime for the JVM.

This version of the gem will be published for JRuby users as the default cross-platform gem. Future gems will make platform-specific binary builds available for users that desire the additional performance.

The goal here is to get a native-code free version of the prism gem released for JRuby users.

@kddnewton kddnewton added the java Pull requests that update Java code label Mar 19, 2026
@headius
Copy link
Copy Markdown
Contributor Author

headius commented Mar 19, 2026

This depends upon changes in #3944 and will be rebased once that merges to main.

@eregon
Copy link
Copy Markdown
Member

eregon commented Mar 19, 2026

I wonder if it'd make sense to support the WASM case in lib/prism/ffi.rb directly and e.g. define LibRubyParser as WASM::PRISM or so in that case, or to share some of the logic.
This looks like an early prototype as much of the logic is copied and probably not working yet (since it refers LibRubyParser and that's not set in the new file).

@headius
Copy link
Copy Markdown
Contributor Author

headius commented Mar 19, 2026

This looks like an early prototype

Yes, that's why it's a draft.

@headius headius force-pushed the wasm_gem branch 3 times, most recently from afd64c3 to 1b02ef4 Compare April 2, 2026 08:30
Rather than templating two versions of sources with and without
non-semantic fields, we can make that determination at build time.
Passing -DPRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS=1 to `make` will
force that variable to be literally true, and the compiler will
eliminate `if` blocks that use it to conditionally serialize non-
semantic data.

This simplifies the templates by removing that variation and allows
building both forms of the library from a single generated set of
sources.
@headius
Copy link
Copy Markdown
Contributor Author

headius commented Apr 2, 2026

Most recent commit is an attempt to move the PRISM_SERIALIZE_ONLY_SEMANTIC_FIELDS into a macro. I'll explain why.

The JVM WASM builds need to take two forms:

  • A semantic-only version for the WASM build for JRuby's internal parser, which uses the Loader API.
  • A non-semantic version for the WASM build for the JRuby gem, which uses the same logic as FFI.

These two artifacts should be sibling modules under java/ and built together as part of the Maven build lifecycle. Ideally there would remain three steps to build this:

  • Generate the templated sources
  • Build the two WASM files
  • Build the Maven artifacts

But because the templated sources must be generated twice, once for semantic-only and again for non-semantic, the build process ends up being much more complicated:

  • Generate the semantic-only sources
  • Build the semantic-only WASM
  • Generate the non-semantic sources
  • Build the non-semantic WASM
  • Build the Maven artifacts

By moving the semantic-only flag into a C macro, a single set of sources can be generated and customized at C compile time, allowing a single make target to build both WASM forms.

Ideally the other two major C templating flags could also move into build-time macros:

  • Pretty printing could be omitted at C compile time and eliminated from the resulting binary.
  • Node ID could be omitted for builds that do not want it. (I had to tweak the INCLUDE_NODE_ID template flag because it was intertwined with the templated PRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS

Really, the only tricky part is for the Java API that has two forms: with node ID and without node ID, and I think we can handle that a different way (perhaps by adding another flag to the serialization header?)

private Nodes.Node loadNode() {
int type = buffer.get() & 0xFF;
<%- if Prism::Template::INCLUDE_NODE_ID -%>
<%- unless Prism::Template::OMIT_NODE_ID -%>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep the constant as Prism::Template::INCLUDE_NODE_ID?
That way:

  • The diff is much smaller and focused on the semantic fields change
  • There is no double negation (unless ... OMIT)

The ENV var can be either.

JAVA_BACKEND = ENV["PRISM_JAVA_BACKEND"] || "default"
JAVA_IDENTIFIER_TYPE = JAVA_BACKEND == "truffleruby" ? "String" : "byte[]"
INCLUDE_NODE_ID = !SERIALIZE_ONLY_SEMANTICS_FIELDS || JAVA_BACKEND == "jruby"
OMIT_NODE_ID = ENV.fetch("PRISM_OMIT_NODE_ID", false)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OMIT_NODE_ID = ENV.fetch("PRISM_OMIT_NODE_ID", false)
INCLUDE_NODE_ID = ENV.fetch("PRISM_INCLUDE_NODE_ID", "true") != "false"

(mentioned on Slack)

<%- if node.flags -%>
pm_buffer_append_varuint(buffer, (uint32_t) node->flags);
<%- else -%>
if (!PRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS) {
Copy link
Copy Markdown
Member

@eregon eregon Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!PRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS) {
#ifndef PRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS

would be better because it then goes away during preprocessing vs later in the compiler, which could potentially make a difference e.g. in inlining due to the function being bigger.
It's also clearer this is not a runtime check, and setting it must be done with -D.

That would mean PRISM_SERIALIZE_ONLY_SEMANTICS_FIELDS must not be defined by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

java Pull requests that update Java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants