How fast ? I gather, for example, that Intel's Fortran is a frontend to LLVM. So the speed of that Fortran can't be much different than anything compiled by LLVM such as rust or C. Correct ?
the main question is: how fast after how much learnings and optimizations that went into the 'naive' version of the application.
Just an example on how to declaring a couple of input float pointers in a fully optimized way, avoiding aliases and declaring it readonly (if I remember correctly, it's been a few years:
of course what happens then is that people will do a typedef and hide it away... and then every application invents its own standards and becomes harder to interoperate on a common basis. in Fortran it's just baked in.
Another example: multidimensional arrays. In C/C++ it either doesn't exist, is not flat memory space (and thus for grid applications very slow), or is an external library (again restricting your interop with other scientific applications). In Fortran:
integer, parameter :: n = 10, m = 5
real(32), dimension(n, m) :: foo, bar, baz
Again, reasonably simple to understand, and it's already reasonably close to what you want - there are still ways to make it better like memory alignment, but I'd claim you're already at 80% of optimal as long as you understand memory layout and its impact on caching when accessing it (which is usually a very low hanging fruit, you just have to know which order to loop it over).
> Another example: multidimensional arrays. In C/C++ it [...] is not flat memory space (and thus for grid applications very slow)
Can you clarify what you mean by this? A naively defined multidimensional array,
float foo[32][32] = ...
is stored in sizeof(float) * 32 * 32 = 4096 consecutive bytes of memory. There's no in-built support for array copies or broadcasting, I'll give you that, but I'm still trying to understand what you mean by "not flat memory space".
int a, b;
int arr[a][b]; // ERROR: all array dimensions except for the first must be constant
//or even:
auto arr = new int[a][b]; // ERROR: error: array size in new-expression must be constant
All you can do if you want a dynamic multidimensional array is:
int a, b;
auto arr = new int*[a];
for (int i = 0; i < a; i++) {
arr[i] = new int[b];
}
But now it's a jagged array, it's no longer flat in memory.
[disclaimer: I don’t know what I’m talking about in this comment]
Could you say like,
auto arr = new int[a*(b+1)];
int \* arr2 = (int\*)arr;
for(int i=0;i<a;i++){
arr2[i]=arr+(i*(b+1))+1;
}
and then be able to say like arr2[j][k] ?
(Assuming that an int and a pointer have the same size).
It still has to do two dereferences rather than doing a little more arithmetic before doing a single dereference, but (other than the interspersed pointers) it’s all contiguous?
But maybe that doesn’t count as being flat in memory.
You can do some trick like that, though arr2 would have to have type int** for that to work, and it wouldn't really work for an int array (though there's no real reason to store the pointers into arr inside arr itself - they could easily go into a separate place).
However, this would still mean that you need to do 2 pointer reads to get to an element (one to get the value of arr2[i], then another to get the value of arr2[i][j]). In C or Fortran or C++ with compile-time known dimensions, say an N by M array, multiArr[i][j] is a single pointer read, as it essentially translates to *(multiArr + i*M + j).
Ok, I was wrong about it not being flat, but due to lack of support from my experience it is usually not used to represent grids in a simulation, as dealing with it tends to involve loops, which may or may not be optimised away. In Fortran multidim arrays are supported so much that they are used everywhere. Even debuggers allow you to display arbitrary slices of multidim arrays in local scope.
Indeed, but they're still in the Standard, so if an implementation does them, it does them in a portable way - and gcc and Clang both support them, so there's no practical issue with access to the feature.
GHC with `-fllvm` is not going make Haskell any easier to compile just because it's targeting LLVM. Fortran is (relatively!) easy to make fast because the language semantics allow it to. Lack of pointer aliasing is one of the canonical examples; C's pointer aliasing makes systems programming easier, but high-performance code harder.
Pointers in Fortran can alias with each other and with valid pointer targets. What makes Fortran potentially easier to optimize are its rules against dummy argument aliasing in most (but not all) cases.
Thanks for the precise Fortran terminology; that's what I meant to say in my head but you're correct. From the linked website for everyone else:
"Fortran famously passes actual arguments by reference, and forbids callers from associating multiple arguments on a call to conflicting storage when doing so would cause the called subprogram to write to a bit of that storage by means of one dummy argument and read or write that same bit by means of another."
You have to caveat “faster than C.” It is basically impossible to beat C with sprinkled in assembly or intrinsics as appropriate.
Most people, even good C programmers, don’t write that kind of C, though.
The point of Fortran is that you can write Fortran code that is almost as fast as that nightmare C, but you can do it and still get your degree on time.
Negative. Fortran is in a better position to guarantee certain facts about sequential access in arrays and solid information about pointer aliasing that is generally not available in C or C++ unless the author of the C/C++ is extremely aware for compiler quirks and pragma. Fortran has a "just works" attitude to high speed array processing where as other languages are focused on edge cases that are good for PhD thesis on general computing optimization but rarely work in the easiest case without extensive pragmas.
See also CUDA. Sure you can write a C to CUDA converter auto-vectorizer but its likely to have all sorts of bugs and usually never work right except in rare hand-tune cases. May as well just write CUDA from scratch if it is to be performant. Same for array processing, wanna array process? Use compiler for array processing like Fortran or ISPC.