@ilayn brought up this topic again (xref RFC: Deprecating `scipy.odr` - #19 by ilayn ) and suggested integrating RandBLAS. I had a look at it, and concluded it has two major issues that are blocking for us:
- It requires OpenMP in its CMakeLists.txt. Possibly easy to undo, since in the code it does seem optional (`#
f defined(RandBLAS_HAS_OpenMP)). - It depends on BLAS++. That looks like a hard blocker; BLAS++ does its own BLAS/LAPACK detection in a poorer way than we need, it doesn’t yet have support for Apple Accelerate, doesn’t handle symbol suffixes, etc.
It also required C++20 until a month ago, that’s now fixed it looks like and C++17 works too. Hooking up the random number generation correctly might also be nontrivial, not sure.
We’d probably have to implement a shim layer that exposes a C++ API like BLAS++ (blas::layout, blas::gemm, etc.) and is backed by our npy_cblas.h machinery. And then convince RandBLAS to use that instead of BLAS++.
All that probably isn’t worth it - doing what scikit-learn does and having a pure Python implementation looks like the way to go, given that the performance characteristics will largely be coming from BLAS itself, and that we can have GPU support that way as @ogrisel pointed out.