How fast ? I gather, for example, that Intel's Fortran is a frontend to LLVM. So...

m_mueller · on May 16, 2023

the main question is: how fast after how much learnings and optimizations that went into the 'naive' version of the application.

Just an example on how to declaring a couple of input float pointers in a fully optimized way, avoiding aliases and declaring it readonly (if I remember correctly, it's been a few years:

Fortran:

    real(32), intent(in) :: foo, bar, baz

C:

    const float *const restrict foo
    const float *const restrict bar
    const float *const restrict baz

of course what happens then is that people will do a typedef and hide it away... and then every application invents its own standards and becomes harder to interoperate on a common basis. in Fortran it's just baked in.

Another example: multidimensional arrays. In C/C++ it either doesn't exist, is not flat memory space (and thus for grid applications very slow), or is an external library (again restricting your interop with other scientific applications). In Fortran:

    integer, parameter :: n = 10, m = 5
    real(32), dimension(n, m) :: foo, bar, baz

Again, reasonably simple to understand, and it's already reasonably close to what you want - there are still ways to make it better like memory alignment, but I'd claim you're already at 80% of optimal as long as you understand memory layout and its impact on caching when accessing it (which is usually a very low hanging fruit, you just have to know which order to loop it over).

messe · on May 16, 2023

> Another example: multidimensional arrays. In C/C++ it [...] is not flat memory space (and thus for grid applications very slow)

Can you clarify what you mean by this? A naively defined multidimensional array,

    float foo[32][32] = ...

is stored in sizeof(float) * 32 * 32 = 4096 consecutive bytes of memory. There's no in-built support for array copies or broadcasting, I'll give you that, but I'm still trying to understand what you mean by "not flat memory space".

simiones · on May 16, 2023

C++ doesn't support something like:

  int a, b;

  int arr[a][b]; // ERROR: all array dimensions except for the first must be constant

  //or even: 

  auto arr = new int[a][b]; // ERROR:  error: array size in new-expression must be constant

All you can do if you want a dynamic multidimensional array is:

  int a, b;

  auto arr = new int*[a];
  for (int i = 0; i < a; i++) {
    arr[i] = new int[b];
  }

But now it's a jagged array, it's no longer flat in memory.

drdeca · on May 17, 2023

[disclaimer: I don’t know what I’m talking about in this comment]

Could you say like,

  auto arr = new int[a*(b+1)];
  int \* arr2 = (int\*)arr;
  for(int i=0;i<a;i++){
  arr2[i]=arr+(i*(b+1))+1;
  }

and then be able to say like arr2[j][k] ?

(Assuming that an int and a pointer have the same size).

It still has to do two dereferences rather than doing a little more arithmetic before doing a single dereference, but (other than the interspersed pointers) it’s all contiguous? But maybe that doesn’t count as being flat in memory.

enriquto · on May 17, 2023

Yes, you can do all sort of tricks in C++. You can also package them into a "matrix" class, or use one of the thousands that are publicly available.

The thing is that, in Fortran (and even in C), you don't need any of this because the construction is part of the language itself.

tsimionescu · on May 18, 2023

You can do some trick like that, though arr2 would have to have type int** for that to work, and it wouldn't really work for an int array (though there's no real reason to store the pointers into arr inside arr itself - they could easily go into a separate place).

However, this would still mean that you need to do 2 pointer reads to get to an element (one to get the value of arr2[i], then another to get the value of arr2[i][j]). In C or Fortran or C++ with compile-time known dimensions, say an N by M array, multiArr[i][j] is a single pointer read, as it essentially translates to *(multiArr + i*M + j).

m_mueller · on May 16, 2023

Ok, I was wrong about it not being flat, but due to lack of support from my experience it is usually not used to represent grids in a simulation, as dealing with it tends to involve loops, which may or may not be optimised away. In Fortran multidim arrays are supported so much that they are used everywhere. Even debuggers allow you to display arbitrary slices of multidim arrays in local scope.

int_19h · on May 16, 2023

I think they meant dynamically sized multidimensional arrays (and arrays of pointers often used to emulate those).

However, even then it's not quite true given C99 VLA.

simiones · on May 16, 2023

I should note that VLAs are no longer a guaranteed feature of even a standards-compliant C compiler (they were made an optional feature in C2011).

int_19h · on May 16, 2023

Indeed, but they're still in the Standard, so if an implementation does them, it does them in a portable way - and gcc and Clang both support them, so there's no practical issue with access to the feature.

Georgelemental · on May 17, 2023

In Rust, those `noalias` read-only pointer arguments are

    foo: &f32,
    bar: &f32,
    baz: &f32,

There are no dynamically-sized multi-dimensional flat memory arrays though, at least not in the core language or standard library.

cscheid · on May 16, 2023

The language semantics matter greatly.

GHC with `-fllvm` is not going make Haskell any easier to compile just because it's targeting LLVM. Fortran is (relatively!) easy to make fast because the language semantics allow it to. Lack of pointer aliasing is one of the canonical examples; C's pointer aliasing makes systems programming easier, but high-performance code harder.

pklausler · on May 16, 2023

Pointers in Fortran can alias with each other and with valid pointer targets. What makes Fortran potentially easier to optimize are its rules against dummy argument aliasing in most (but not all) cases.

(See https://github.com/llvm/llvm-project/blob/main/flang/docs/Al... for the full story.)

cscheid · on May 16, 2023

> dummy argument aliasing

Thanks for the precise Fortran terminology; that's what I meant to say in my head but you're correct. From the linked website for everyone else:

"Fortran famously passes actual arguments by reference, and forbids callers from associating multiple arguments on a call to conflicting storage when doing so would cause the called subprogram to write to a bit of that storage by means of one dummy argument and read or write that same bit by means of another."

bee_rider · on May 17, 2023

You have to caveat “faster than C.” It is basically impossible to beat C with sprinkled in assembly or intrinsics as appropriate.

Most people, even good C programmers, don’t write that kind of C, though.

The point of Fortran is that you can write Fortran code that is almost as fast as that nightmare C, but you can do it and still get your degree on time.

GeompMankle · on May 16, 2023

Negative. Fortran is in a better position to guarantee certain facts about sequential access in arrays and solid information about pointer aliasing that is generally not available in C or C++ unless the author of the C/C++ is extremely aware for compiler quirks and pragma. Fortran has a "just works" attitude to high speed array processing where as other languages are focused on edge cases that are good for PhD thesis on general computing optimization but rarely work in the easiest case without extensive pragmas.

See also CUDA. Sure you can write a C to CUDA converter auto-vectorizer but its likely to have all sorts of bugs and usually never work right except in rare hand-tune cases. May as well just write CUDA from scratch if it is to be performant. Same for array processing, wanna array process? Use compiler for array processing like Fortran or ISPC.