Comparing Fortran & C++ assembler for int = floor(sqrt(…))

37

I have implemented a function in Fortran and C++ each:

#include <math.h>

void dbl_sqrt_c(double *x, double *y){
   *y = sqrt(*x - 1.0);
   return;
}
pure subroutine my_dbl_sqrt(x,y) bind(c, name="dbl_sqrt_fort")
   USE, INTRINSIC :: ISO_C_BINDING
   implicit none
   real(kind=c_double), intent(in)  :: x
   real(kind=c_double), intent(out) :: y

   y = sqrt(x - 1d0)
end subroutine my_dbl_sqrt

I compared them in the compiler explorer:

Fortran: https://godbolt.org/z/froz4rx97
C++: https://godbolt.org/z/45aex99Yz

And the way I read the assembler, they do basically the same thing, but C++ checks whether the argument of the sqrt is negative, which Fortran doesn't. I compared their performance using googles benchmark, but they are pretty evenly matched:

--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
bm_dbl_c/8          2.07 ns         2.07 ns    335965892
bm_dbl_fort/8       2.06 ns         2.06 ns    338643106

Here is the interesting part. If I turn this into integer based functions:

void int_sqrt_c(int *x, int *y){
   *y = floor(sqrt(*x - 1.0));
   return;
}

and

pure subroutine my_int_root(x,y) bind(c, name="int_sqrt_fort")
   USE, INTRINSIC :: ISO_C_BINDING
   implicit none
   integer(kind=c_int), intent(in)  :: x
   integer(kind=c_int), intent(out) :: y

   y = floor(sqrt(x - 1d0))
end subroutine my_int_root

Then this is where they start to diverge:

--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
bm_int_c/8          3.05 ns         3.05 ns    229239198
bm_int_fort/8       2.13 ns         2.13 ns    328933185

The Fortran code seems not significantly slower by this change, but the C++ code slowed down by 50%. This seems quite large. These are the assemblies:

Fortran: https://godbolt.org/z/axqqrc5E1
C++: https://godbolt.org/z/h7K75oKbn

The Fortran assembly seems pretty straight forward. It just adds conversion between double and int and not much else, but C++ seems to do a lot more, which I don't full understand.

Why is the C++ assembler so much more complicated? How could I improve the C++ code to achieve matching performance?

Share
Improve this question
2
  • 11
    You're hobbled by bad defaults and compatibility with obsolete machines: Bad defaults are gcc setting errno for floating-point computations (despite this not being required by the C langauge), and compatibility with x86 machines that don't have any better SSE instructions than SSE2. If you want decent code generation, add -fno-math-errno -msse4 to the compiler flags – EOF Apr 11 at 15:38
  • This work almost perfectly: bm_int_c/8 2.08 ns ; bm_int_fort/8 2.09 ns If you write an answer, I'll accept it. – Stein Apr 11 at 15:48

Comments

Popular posts from this blog

Meaning of `{}` for return expression

Get current scroll position of ScrollView in React Native

flutter websocket connection issue