Commit b1df2d2
feat: add WASM SIMD128 path for f_pixel::diff()
Add a wasm32+simd128 implementation of the hot diff() function using
safe core::arch::wasm32 intrinsics (f32x4 constructor, no unsafe).
Translates the existing SSE/NEON pattern:
- f32x4() to pack ARGB into a v128 (safe, no pointer load)
- f32x4_sub/add/mul/max for packed arithmetic
- f32x4_extract_lane + scalar add for horizontal RGB sum
Also adds wasm32+simd128 to the repr(C, align(16)) cfg and excludes
it from the scalar fallback guard.
Measured ~1.9x end-to-end speedup on a 256x256 quantization benchmark
running in wasmtime (scalar: 260ms/iter → simd128: 135ms/iter).1 parent 21a6396 commit b1df2d2
1 file changed
Lines changed: 27 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
54 | 79 | | |
55 | 80 | | |
56 | 81 | | |
| |||
0 commit comments