It's not that specific to Numpy, it deals with all kinds of properly typed Pytho...

certik · on July 29, 2023

Yes, the Numba use case is a subset of LPython. We want to support what Numba does, that is, you decorate your function and JIT it. But in addition, we also want to compile to binaries (ahead of time) that have no CPython dependency, and support high performance optimizations. Numba speeds up Python a lot, but doesn't seem to quite get the Fortran/C++ level of performance sometimes. One of our main goals is to be able to get maximum performance, so that eventually as a user you can depend on LPython that if it compiles it, it will run at least as fast as C++ or Fortran would.

KeplerBoy · on July 29, 2023

> At least as fast as C++ or Fortran

That obviously has to be the goal, but is it really feasible to be faster than good C/C++ or Fortran? I did some research into the Python Compiler landscape and came to the conclusion that it almost always boils to LLVM. So, if you want to have fast code, just help the compiler make the most of your code and you'll be 99% of the way there and as fast as possible without significantly more effort.

Would you agree with my layman's understanding of this topic?

certik · on July 29, 2023

It's harder to imagine for a Python compiler, so let's just focus on LFortran (since LPython delivers exactly the same performance, due to sharing the middle end and backends). The Fortran compilers traditionally were faster than C++, and almost always (even today) are at least as fast as C++, due to the Fortran language being simpler and higher level, designed to allow good optimizations. LFortran competes with other compilers as well as C++ compilers and our goal indeed has to be to be at least as fast as the competition. We currently are sometimes faster sometimes slower, but we are in the same league, so that's a good start.

Regarding LLVM: my experience so far is that LLVM is indeed amazing what it can do in terms of optimizations. It's very very good. However, it is not all LLVM. As our benchmarks in the blog post show, we compare Numba, Clang and LPython, all three of which use LLVM. But we get vastly different performance for what seems to look like identical initial code. To know exactly why, we would have to meet with the Numba and Clang developers and study this, I suspect Clang lowers to LLVM too soon, and uses C++ to do abstractions (like `std::vector` or `std::unordered_map`) and perhaps it can't quite get the top performance this way. Numba perhaps doesn't get all the types as tight as LPython, or perhaps implements some things not as efficiently, or perhaps doesn't apply as good optimizations before lowering to LLVM. I suspect LLVM gets the best performance if the compiler generates as straightforward LLVM IR code as possible, without layers and layers of abstractions that might not end up being "zero cost" in practice.

KeplerBoy · on July 29, 2023

Thanks for the reply. I appreciate it.

nicoco · on July 29, 2023

There are things you cannot do with numpy and that numba helps with, eg a custom PDE solver where you cannot avoid iterating over arrays' elements.

thecfrog · on July 29, 2023

Some things can’t be vectorized and sometimes the vectorized implementation requires too much intermediate memory.