fix: harden reassociation barriers for fast-math by DiamonDinoia · Pull Request #1276 · xtensor-stack/xsimd

DiamonDinoia · 2026-03-19T20:06:39Z

There is a but in nearbyint -fassociative-math breaks it as it does not define __FAST_MATH__.
Also the barrier used was causing a stack spill.

I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.

Now we can just use reassociation_barrier to avoid compiler reordering of instructions.

Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).

Cheers,
Marco

DiamonDinoia · 2026-03-23T20:29:52Z

@serge-sans-paille this is also ready for review.

.github/workflows/windows.yml

include/xsimd/arch/xsimd_common_fwd.hpp

DiamonDinoia · 2026-03-27T20:33:07Z

I think I can simplify this a bit more. A is not needed anymore if I we go this route.

Use arch-specific register constraints to prevent -ffast-math from reassociating arithmetic without forcing a register spill to the stack. Each platform's base arch header provides a reassociation_barrier overload using the tightest register constraint for that target: - x86 (sse2.hpp): "+x" — XMM/YMM/ZMM - ARM (neon.hpp): "+w" — NEON vector - ARM (sve.hpp): "+w" — SVE Z-register - PPC (vsx.hpp): "+wa" — VS register - fallback (common): "r"(&x) + "memory" clobber The x86 overload uses template<T,A> to catch all x86 arches (sse2, avx, avx512f and descendants) via overload resolution against the common fallback's requires_arch<common>. Also adds a mandatory const char* reason parameter to document why each barrier exists at each call site, and removes the now-unused memory_barrier_tag.

serge-sans-paille · 2026-03-29T06:54:25Z

include/xsimd/arch/common/xsimd_common_details.hpp

+            template <class T>
+            XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
+            {
+#if XSIMD_WITH_INLINE_ASM


Shouldn't we make this empty if we're not under fast-math?

DiamonDinoia · 2026-03-29T15:00:47Z

Well, if we are not under fast math this should essentially be a no op. Also, fast math does not detect if only associative math is enabled . Which also breaks it. Now it becames checking all flags of the compiler that can reorder floating points.

…

On Sunday, March 29, 2026, serge-sans-paille ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In include/xsimd/arch/common/xsimd_common_details.hpp <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQMMCG3APXMEDVTJKC34TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K64DSL5ZGK5TJMV3V6Y3MNFRWW*discussion_r3005807424__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKY2y44AK$> : > + // exists at each call site; it is unused at runtime. + // + // Two overloads: + // reassociation_barrier(reg, reason) – raw register + // reassociation_barrier(batch, reason) – extracts .data + // + // Uses the tightest register-class constraint for the target so + // the value stays in its native SIMD register (no spill): + // x86 (SSE/AVX/AVX-512) : "+x" – XMM / YMM / ZMM + // ARM (NEON / SVE) : "+w" – vector / SVE Z-reg + // PPC (VSX) : "+wa" – VS register + // other / MSVC : address + memory clobber (fallback) + template <class T> + XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept + { +#if XSIMD_WITH_INLINE_ASM Shouldn't we make this empty if we're not under fast-math? — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQPGDROZDUYFFC3B7XL4TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2L24DSL5ZGK5TJMV3V63TPORUWM2LDMF2GS33OONPWG3DJMNVQ*pullrequestreview-4026446933__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKYc70ZNS$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACGKNQN7OHPQ2TPQJDWPYTL4TDCDPAVCNFSM6AAAAACWYMMMSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAMRWGQ2DMOJTGM__;!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKdMsdfC6$> . You are receiving this because you authored the thread.Message ID: ***@***.***>

serge-sans-paille · 2026-03-29T18:12:39Z

Well, I don't think we can say that

__asm__ volatile("" : : "r"(&x) : "memory");

is a no-op. Should we allow for user-defined XSIMD_REASSOCIATIVE which would be forced by __FAST_MATH__ but could also be set by the user?
I don't want fast-math to impact non-fast math code generation.

DiamonDinoia · 2026-03-29T20:47:05Z

Let me think about this for a second. MSVC does not allow inline asm. I should update the comment there. There might be better alternatives for scalars, WASM, and RISC-V that don't impact performance.

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from f9a9992 to 2f9a431 Compare March 23, 2026 19:55

DiamonDinoia changed the title ~~fix: harden reassociation barriers for fast-math nearbyint~~ fix: harden reassociation barriers for fast-math Mar 23, 2026

DiamonDinoia mentioned this pull request Mar 23, 2026

test(s) fail for x86_64 musl #1244

Open

serge-sans-paille requested changes Mar 27, 2026

View reviewed changes

.github/workflows/windows.yml Outdated Show resolved Hide resolved

.github/workflows/windows.yml Outdated Show resolved Hide resolved

.github/workflows/windows.yml Outdated Show resolved Hide resolved

include/xsimd/arch/xsimd_common_fwd.hpp Outdated Show resolved Hide resolved

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from 3685ff0 to c4dec73 Compare March 27, 2026 19:57

DiamonDinoia mentioned this pull request Mar 27, 2026

ci: add clang-cl Windows CI jobs #1283

Open

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from c4dec73 to 6d6bc61 Compare March 28, 2026 22:41

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from 6d6bc61 to d571f2f Compare March 28, 2026 22:42

serge-sans-paille reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden reassociation barriers for fast-math#1276

fix: harden reassociation barriers for fast-math#1276
DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath

DiamonDinoia commented Mar 19, 2026

Uh oh!

DiamonDinoia commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamonDinoia commented Mar 27, 2026

Uh oh!

serge-sans-paille Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 via email

Uh oh!

serge-sans-paille commented Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DiamonDinoia commented Mar 19, 2026

Uh oh!

DiamonDinoia commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamonDinoia commented Mar 27, 2026

Uh oh!

serge-sans-paille Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Mar 29, 2026 via email

Uh oh!

serge-sans-paille commented Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DiamonDinoia commented Mar 29, 2026 •

edited

Loading