Hi Developers!
-
This query is regarding “running BLAS and LAPACK functions in parallel manner using threads for SciPy”.
-
I have built OpenBLAS dynamic library for windows with number of threads=12. Now, I have built SciPy with the OpenBLAS that I had built and linked it using delvewheel for runtime usage.
-
Initially I have built SciPy with OpenBLAS without any number of threads(NUM_THREADS=0) and the performance of this SciPy wheel were poor in functions of SciPy where it requires BLAS and LAPACK modules from OpenBLAS when compared prebuilt SciPy Wheel available on PyPi(I could see from benchmark results).
-
The same kind of observation is being seen even I uses OpenBLAS that is being built with threads=12, Is there anything I am missing in the configuration? Does any specific thing I have to configure to make sure SciPy is using threaded OpenBLAS?
Thanks!!!