Handle sharedness in Heap2Local's Array2Struct by tlively · Pull Request #8515 · WebAssembly/binaryen

tlively · 2026-03-24T05:28:58Z

Heap2Local uses an internal utility called Array2Struct to turn non-escaping array allocations into struct allocations so they can subsequently be optimized by the Heap2Local struct optimizations. It
previously used a non-shared struct type, even for shared array types, but this could cause problems when the non-shared struct type became a non-shared null in places that required a shared type to
validate. This could specifically occur when the allocation flowed into a StructCmpxchg expected field that needed to be a subtype of shared eqref.

As a drive-by to make the regression test parse correctly, fix child-typer for StructCmpxchg to correctly handle sharedness in the expected field as well.

Heap2Local uses an internal utility called Array2Struct to turn non-escaping array allocations into struct allocations so they can subsequently be optimized by the Heap2Local struct optimizations. It previously used a non-shared struct type, even for shared array types, but this could cause problems when the non-shared struct type became a non-shared null in places that required a shared type to validate. This could specifically occur when the allocation flowed into a StructCmpxchg `expected` field that needed to be a subtype of shared eqref. As a drive-by to make the regression test parse correctly, fix child-typer for StructCmpxchg to correctly handle sharedness in the expected field as well.

aheejin

I'm not familiar with this pass (#3866 said I reviewed it, which I don't have any memory of... 🫠)

This could specifically occur when the allocation flowed into a StructCmpxchg expected field that needed to be a subtype of shared eqref.

Why is the expected operand in StructCmpxchg special?

As a drive-by to make the regression test parse correctly, fix child-typer for StructCmpxchg to correctly handle sharedness in the expected field as well.

How is this fix related to the Heap2Local fix? What happens if this child-typer is not fixed here?

aheejin · 2026-03-24T22:36:24Z

src/ir/child-typer.h

+    auto expectedType = type;
+    if (expectedType.isRef()) {
+      expectedType =
+        Type(HeapTypes::eq.getBasic(type.getHeapType().getShared()), Nullable);


This is preexisting, but why does this have to be Nullable?

When the type of the accessed struct field is a reference, the expected operand is allowed to be any subtype of (ref null eq) (i.e. eqref), or (ref null (shared eq)) if the field type is a shared reference. In either case, the expected operand can be nullable.

aheejin · 2026-03-24T22:38:25Z

test/lit/passes/heap2local-rmw.wast

+  ;; CHECK-NEXT:   (struct.atomic.rmw.cmpxchg $struct 0
+  ;; CHECK-NEXT:    (local.get $struct)
+  ;; CHECK-NEXT:    (block (result (ref null (shared none)))
+  ;; CHECK-NEXT:     (ref.null (shared none))


I get that Heap2Local can extract fields like i32s out of a struct if that struct doesn't escape, but that would just extract the fields into locals and wouldn't create a random i32 values (like 0) to set them, right? Why can we replace (array.new_fixed $array 0) with (ref.null (shared none)) here? Where did this value none come from?

The way Heap2Local works in general is that it places struct fields in locals and also replaces the struct itself with a null value. This generally makes it easier to update the IR than it would be if we removed the value entirely. For example, in this test it would not be valid to leave the expected operand with type none, so we would have had to fix up the struct.atomic.rmw.cmpxchg to maintain validity even though it is unreachable. But making the expected operand null is still valid, so we can less work to fix up the IR.

Why is it OK to create a value (ref.null (shared none) here) out of thin air? It wouldn't be OK to arbitrarly assign 0 for i32s, no? What if we get to use the value? In this case I guess that's not the case because the third operand is unreachable, but that's not always the case.

Also I thought this pass creates locals for each field. Why do we not create a local here?
Sorry, maybe I should read the pass..

It's ok to create the null out of thin air because we know it cannot escape the function (otherwise we would not be optimizing the allocation we are replacing with the null), and we know we will update all the expressions that the value flows to so that they no longer use it.

We don't create a local here because the allocated array has size zero, so there are no elements to put into locals.

So what I was confused about is, ok, it may not escape the function, but why is it allowed to create a value out of thin air and replace another value? What if that changes the result of this instruction? Here we have unreachable so it may not matter, but what if we don't have the unreachable here? Even in that case, the expected operand doesn't escape, no?

It looks, regardless of the third operand, the ref and expected cannot possibly match, because a new array cannot be the same as the reference loaded from $struct local, so maybe we can replace the expected value with anything because of that? But the explanation here is "because it does not escape", which I keep failing to understand. Sorry for holding this (seemingly trivial) PR off for days. Maybe we should just move on.

So what I was confused about is, ok, it may not escape the function, but why is it allowed to create a value out of thin air and replace another value? What if that changes the result of this instruction?

Right, this is only safe because the pass guarantees that it will visit every expression that the new null value could reach and will update those expressions so that they have the same behavior they had before, even now that there is a null. If the pass fails to update one of those instructions and the null value ends up making a difference, then that would be a bug.

It looks, regardless of the third operand, the ref and expected cannot possibly match, because a new array cannot be the same as the reference loaded from $struct local, so maybe we can replace the expected value with anything because of that?

Yes, that's exactly right, and this is the fact we depend on when optimizing allocations that reach the expected operand of reachable cmpxchg instructions, although in that case we also replace the cmpxchg with an atomic read, since we know it will not write anything. The only reason we leave the cmpxchg in the output here is because of the early return when it is unreachable. We similarly do not optimize other unreachable instructions more than necessary to maintain validity.

Sorry for holding this (seemingly trivial) PR off for days.

No worries :) One of the goals for spreading reviews around more is to make sure more of the team understands more of the code, so I think this is working as intended.

tlively · 2026-03-24T23:20:17Z

This could specifically occur when the allocation flowed into a StructCmpxchg expected field that needed to be a subtype of shared eqref.

Why is the expected operand in StructCmpxchg special?

StructCmpxchg is special because it can take 3 reference operands (ref, expected, and replacement), which is more than most other instructions. That allows for more complex combinations of operands being optimized or not or being unreachable or not than are possible for other types. This particular bug depends on ref having a struct type but not being optimizable, expected having an array type and being optimizable, and replacement being unreachable so that the cmpxchg parent is unreachable and therefore not optimized by this pass.

As a drive-by to make the regression test parse correctly, fix child-typer for StructCmpxchg to correctly handle sharedness in the expected field as well.

How is this fix related to the Heap2Local fix? What happens if this child-typer is not fixed here?

The original fuzzer test case depending on running --gufa before --heap2local to set up the IR that triggers the bug. But for checked-in regression test, we want to be able to parse the Wasm text directly into the IR that triggers the bug. Without the fix to child-typer, IRBuilder sees the unreachable replacement and thinks that preceding expected value has an invalid type, so it parses the text into this IR instead:

(drop
 (local.get $shared-struct)
)
(drop
 (struct.new_default $shared-struct)
)
(struct.atomic.rmw.cmpxchg <unprintable type> 0
 (unreachable)
 (unreachable)
 (unreachable)
)

But this IR does not trigger the bug.

aheejin · 2026-03-25T22:43:07Z

StructCmpxchg is special because it can take 3 reference operands (ref, expected, and replacement), which is more than most other instructions. That allows for more complex combinations of operands being optimized or not or being unreachable or not than are possible for other types. This particular bug depends on ref having a struct type but not being optimizable, expected having an array type and being optimizable, and replacement being unreachable so that the cmpxchg parent is unreachable and therefore not optimized by this pass.

Why is this particular condition necessary?
What happens if this field in question of being shared or not is not expected but ref or replacement?

The original fuzzer test case depending on running --gufa before --heap2local to set up the IR that triggers the bug. But for checked-in regression test, we want to be able to parse the Wasm text directly into the IR that triggers the bug. Without the fix to child-typer, IRBuilder sees the unreachable replacement and thinks that preceding expected value has an invalid type, so it parses the text into this IR instead:
(drop
 (local.get $shared-struct)
)
(drop
 (struct.new_default $shared-struct)
)
(struct.atomic.rmw.cmpxchg <unprintable type> 0
 (unreachable)
 (unreachable)
 (unreachable)
)
But this IR does not trigger the bug.

We don't reject the invalid IR, and just make them unreachables? Why? Was the parser this way before, or is this a new behavior of the new parser?

tlively · 2026-03-26T01:30:16Z

StructCmpxchg is special because it can take 3 reference operands (ref, expected, and replacement), which is more than most other instructions. That allows for more complex combinations of operands being optimized or not or being unreachable or not than are possible for other types. This particular bug depends on ref having a struct type but not being optimizable, expected having an array type and being optimizable, and replacement being unreachable so that the cmpxchg parent is unreachable and therefore not optimized by this pass.

Why is this particular condition necessary? What happens if this field in question of being shared or not is not expected but ref or replacement?

replacement being unreachable is important because that makes the cmpxchg unreachable, which means that the pass returns early and does not try to optimize it, leaving the cmpxchg in the output. ref having a struct type (with a shared field) sets up the validation requirement that expected needs to be a subtype of (ref null (shared eq)). This is obviously true in the input, but it needs to be true in the output as well. expected being optimizable and having an array type in the input triggers the bug and means that it had an unshared reference type in the output. All these elements came together to create a validation failure in the pass output before the fix.

If the shared array reference flowed in to ref and became an unshared null reference, that would not cause a validation failure because then we would have no further constraints on the types of expected or replacement. If the shared array reference flowed into replacement, OTOH, we might be able to catch the same bug by making expected unreachable instead. But I don't think we need to test it both ways.

We don't reject the invalid IR, and just make them unreachables? Why? Was the parser this way before, or is this a new behavior of the new parser?

This is a new behavior of the new parser. The old parser would have parsed invalid IR, but the new parser does extra work to parse this into valid IR because it is valid WebAssembly. This is good in general, but means bugs in the parser can prevent us from parsing IR we need for specific tests involving unreachability.

aheejin · 2026-03-26T23:17:27Z

We don't reject the invalid IR, and just make them unreachables? Why? Was the parser this way before, or is this a new behavior of the new parser?

This is a new behavior of the new parser. The old parser would have parsed invalid IR, but the new parser does extra work to parse this into valid IR because it is valid WebAssembly. This is good in general, but means bugs in the parser can prevent us from parsing IR we need for specific tests involving unreachability.

Ah the wasm itself is valid because of polymorphic stack. Thanks for the explanation

aheejin

I understand why the fixes were necessary. I still don't 100% get why that specific test output should be that way, but I don't think there's a point holding this off. Sorry for the delay and thanks for the explanation.

tlively requested a review from a team as a code owner March 24, 2026 05:28

tlively requested review from aheejin and removed request for a team March 24, 2026 05:28

tlively added 2 commits March 24, 2026 11:35

fix test

225628d

Merge branch 'main' into heap2local-shared-array

5ba399c

aheejin reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into heap2local-shared-array

e40859d

aheejin approved these changes Mar 26, 2026

View reviewed changes

tlively merged commit df5c0e0 into main Mar 26, 2026
16 checks passed

tlively deleted the heap2local-shared-array branch March 26, 2026 23:29

Conversation

tlively commented Mar 24, 2026

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

aheejin Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

aheejin Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

tlively Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

aheejin Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

aheejin Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

tlively Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

tlively commented Mar 24, 2026

Uh oh!

aheejin commented Mar 25, 2026

Uh oh!

tlively commented Mar 26, 2026

Uh oh!

aheejin commented Mar 26, 2026

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aheejin Mar 24, 2026 •

edited

Loading

aheejin Mar 25, 2026 •

edited

Loading