Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sadly even SSE vs. AVX is enough to often give different results, as SSE doesn't have support for fused multiply-add instructions which allow calculation of a*b + c with guaranteed correct rounding. Even though this should allow CPUs from 2013 and later to all use FMA, gcc/clang don't enable AVX by default for the x86-64 targets. And even if they did, results are only guaranteed identical if implementations have chosen the exact same polynomial approximation method and no compiler optimizations alter the instruction sequence.

Unfortunately, floating point results will probably continue to differ across platforms for the foreseeable future.



That's a bit of a different problem IMO.

Barring someone doing a "check if AVX is available" check inside their code, binaries are generally compiled targeting either SSE or AVX and not both. You can reasonably expect that the same binary thrown against multiple architectures will have the same output.

This, of course, doesn't apply if we are talking about a JIT. All bets are off if you are talking about javascript or the JVM.

That is to say, you can expect that a C++ binary blob from the Ubuntu repo is going to get the same numbers regardless the machine since they generally will target fairly old architectures.


> Barring someone doing a "check if AVX is available" check inside their code

Afaik that is exactly what glibc does internally


GCC won't use FMA without fast-math though. Even when AVX is otherwise enabled.


Sure it will:

> -ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them

> The default is -ffp-contract=off for C in a standards compliant mode (-std=c11 or similar), -ffp-contract=fast otherwise.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...


Oh, wow, forgot about fp-contract. It says it is off in C by default, what about C++?


Read closer, it defaults to fast, not off


I would have expected to be a bug in the documentation? Why would they turn FMA off for standard compliant C mode, but not for standard compliant C++ mode?

But the documentation does appear to be correct: https://godbolt.org/z/3bvP136oc

Crazy.


it defaults to off for standard-compliant mode. Which in my mind was the default mode as that's what we use everywhere I have worked in the last 15 years. But of course that's not the case.

In any case, according to the sibling comment, the default is 'fast' even in std-compliant mode in C++, which I find very surprising. I'm not very familiar with that corner of the standard, but it must be looser than the equivalent wording in the C standard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: