Skip to content

Experimental AutoBatch pass#8530

Draft
kripken wants to merge 53 commits intoWebAssembly:mainfrom
kripken:autobatch
Draft

Experimental AutoBatch pass#8530
kripken wants to merge 53 commits intoWebAssembly:mainfrom
kripken:autobatch

Conversation

@kripken
Copy link
Member

@kripken kripken commented Mar 26, 2026

Background:

WebAssembly/component-model#371 (comment)

We pay a cost to cross the wasm-JS boundary. If this happens a lot it can be significant. One way to avoid such boundary crossings is to batch calls, to build a buffer of serialized instructions and then call JS once to read it from linear memory and execute it. This approach is taken by Emscripten's GL proxying and webcc. If there are many short calls, this can speed things up in some cases.

This PR does something related but more general: it takes an input wasm and automatically applies batching to all calls where it can. All calls not returning a value are batched, while calls that do return a value flush the buffer and then run normally. The code also autogenerates JS deserialization code to match the serialization, and you paste that into the JS and that's it.

This is not safe in general, because of issues like reentrancy (wasm->js->wasm->js) and stale data (if a pointer is serialized to be used later, that data must not be modified). If we decide to productionize this, there would need to be user control over what is autobatched and what is not, etc. (and in emscripten specifically we could do things like enable this on all proxy: async methods, by default; other toolchains might have similar things). For now, however, this makes it easy to get benchmark numbers.

I measured three things:

  • A trivial microbenchmark. This becomes 2x faster.
  • The webcc benchmark. This uses embind, so it is actually going through an inefficient and unrecommended path for speed-intensive code, but still interesting I think. It becomes 1.5x faster.
  • A glgears benchmark which tests WebGL. This shows no speedup, and I confirmed in the profiler that there isn't really significant js/wasm boundary overhead here.

(These measurements are total time - I didn't measure the cost of individual wasm->js calls. But obviously this reduces that overhead to essentially 0, if you have enough calls being batched.)

So this does show a large speedup as expected, when doing large amounts of js/wasm boundary crossings for small amounts of work. However, I don't know how common that is in practice - the last benchmark I tested, with WebGL where I saw no speedup, is probably representative of most WebGL code out there (where proper shader and buffer usage avoids js/wasm overhead anyhow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant