Some of you have heard this already anyways, but just to lure in some hibernating C/Assembly lovers, here is an announcement:
More than a year ago, we have been discussing about faster Scientific Python in this thread, and I was mentioning my unicorn dream Towards a Faster Python project for scientific Python - #26 by ilayn it seems like it was on my bingo card this year.
The main motivation for this is to take any BLAS vendor, use its CBLAS interface and put a C based LAPACK layer on top with no Fortran dependency.
After experimenting with LLMs and with the “fun” experience I got from SciPy translations, I went ahead and translated LAPACK. The easiest entry point is the documentation (bunch of rst files for now) and then the GitHub link is at the top right
Everything is potentially broken but somehow not. Mostly due to the fact that the tests are partially ported. Currently the entire focus is on porting the entire Reference test suite and making everything pass.
I already modified the dgetrf.c implementation with some inspiration from faer Rust project and it is head-to-head with the OpenBLAS’ internally optimized version. That is to say, it is certainly possible to accelerate the LAPACK layer just like many vendors did for BLAS layer.
So in case you trust your C/Assembly skillz, then we should get to work for faster LAPACK. All feedback welcome.