It's not that specific to Numpy, it deals with all kinds of properly typed Python as long as no packages are used. In fact you don't need Numba if you're already using proper Numpy code.
Yes, the Numba use case is a subset of LPython. We want to support what Numba does, that is, you decorate your function and JIT it. But in addition, we also want to compile to binaries (ahead of time) that have no CPython dependency, and support high performance optimizations. Numba speeds up Python a lot, but doesn't seem to quite get the Fortran/C++ level of performance sometimes. One of our main goals is to be able to get maximum performance, so that eventually as a user you can depend on LPython that if it compiles it, it will run at least as fast as C++ or Fortran would.
That obviously has to be the goal, but is it really feasible to be faster than good C/C++ or Fortran? I did some research into the Python Compiler landscape and came to the conclusion that it almost always boils to LLVM. So, if you want to have fast code, just help the compiler make the most of your code and you'll be 99% of the way there and as fast as possible without significantly more effort.
Would you agree with my layman's understanding of this topic?
It's harder to imagine for a Python compiler, so let's just focus on LFortran (since LPython delivers exactly the same performance, due to sharing the middle end and backends). The Fortran compilers traditionally were faster than C++, and almost always (even today) are at least as fast as C++, due to the Fortran language being simpler and higher level, designed to allow good optimizations. LFortran competes with other compilers as well as C++ compilers and our goal indeed has to be to be at least as fast as the competition. We currently are sometimes faster sometimes slower, but we are in the same league, so that's a good start.
Regarding LLVM: my experience so far is that LLVM is indeed amazing what it can do in terms of optimizations. It's very very good. However, it is not all LLVM. As our benchmarks in the blog post show, we compare Numba, Clang and LPython, all three of which use LLVM. But we get vastly different performance for what seems to look like identical initial code. To know exactly why, we would have to meet with the Numba and Clang developers and study this, I suspect Clang lowers to LLVM too soon, and uses C++ to do abstractions (like `std::vector` or `std::unordered_map`) and perhaps it can't quite get the top performance this way. Numba perhaps doesn't get all the types as tight as LPython, or perhaps implements some things not as efficiently, or perhaps doesn't apply as good optimizations before lowering to LLVM. I suspect LLVM gets the best performance if the compiler generates as straightforward LLVM IR code as possible, without layers and layers of abstractions that might not end up being "zero cost" in practice.