Scipy.special: Faster, real-only versions of special functions?

From what I can see [1], lambertw is computed using complex values, even when the input is real. When the input is real, using complex arithmetic is presumably performing many unnecessary operations. The same algorithm using real-only arithmetic would be meaningfully faster, I expect.

However, I am no expert. Would it be technically practical to have a real-only implementation of a special function alongside the current complex version, and have scipy.special.foo somehow dispatch appropriately based on the type of the input?

I have not been able to find previous discussion of this topic.

[1] github. com/scipy/xsf/blob/main/include/xsf/lambertw.h#L67 (account is too new to post links)

Hi @kandersolar. Yes, NumPy ufuncs allow dispatching to different implementations based on dtype. Almost all ufuncs in special that have complex implementations also have real implementations (see special.iv for instance), but lambertw is kind of an odd one out. There’s no reason that there couldn’t be a real-only implementation though if you have an application that would benefit from one.

1 Like

Thank you very much @steppi! We do have an application (solving for points on a solar panel’s simulated current-voltage curve) that has no need for complex numbers and would benefit from a faster lambertw function. We are considering switching to our own implementation [1] using numpy for a ~3x speedup over the current scipy.special.lambertw, but I’d much rather use scipy if there was no major speed penalty.

I would like to contribute this. I’ve poked around scipy/special and see roughly how the pieces fit together. What procedural steps should be taken before I get started – open a GH issue, mailing list post?

[1] github. com/pvlib/pvlib-python/pull/2723

Thanks @kandersolar. I think a nice way this could be done is to change std::complex<double> z here xsf/include/xsf/lambertw.h at 163e4d43bc4afa7f95701f30725b0aede420ef3b · scipy/xsf · GitHub to a generic template parameter T z, and keep basically the same implementation, using an if constexpr (std::is_floating_point<T>::value) check anytime the behavior needs to be different for real vs complex inputs. cevalpoly could be replaced with an evalpoly that has real and complex overloads, and wraps polevl from xsf/cephes/polevl.h in the real case. That should get us there pretty quickly. Then it would be straightforward to adjust the ufunc generation in SciPy to add the real overload.

What’s currently lambertw could be renamed to lambertw_impl then and placed in the detail namespace, and then we could add explicit float, double, std::complex<float> and std::complex<double> overloads for lambertw, with the single precision float and std::complex<float> overloads casting to the respective double precision types before computing, and then cast the result back to float, just as the std::complex<float> overload is doing now. Not making the template public and shipping fixed overloads will help prevent ambiguous template resolution problems.

I found an alternative calculation method: https://www.researchgate.net/profile/Toshio-Fukushima/publication/346309410_Precise_and_fast_computation_of_Lambert_W_function_by_piecewise_minimax_rational_function_approximation_with_variable_transformation/links/5fbe04fd458515b7976a3395/Precise-and-fast-computation-of-Lambert-W-function-by-piecewise-minimax-rational-function-approximation-with-variable-transformation.pdf

It is much faster than the method used in scipy. In a separate branch, I implemented it as a ufunc for scipy.special and observe a > 10x speed-up. Here is a comparison, where “base scipy” is scipy v1.16.0 installed from PyPI, “numpy” is our own numpy-based lambertw calculation, “PR116” is the double-precision implementation in xsf PR 116, and “fukushima” is Fukushima’s “high precision” method:

The method is quite simple, although it does require large coefficient tables. @steppi what do you think about using this method for the real-only calculation path instead of extending the existing method via templates?

P.S. I think my posts were hidden because I posted too many github links :frowning:

I think that’s a great idea. Minimax approximations are pretty much the gold standard approach for creating floating point approximations to real-valued functions on intervals, and you’ll find lots of coefficient tables for other function implementations in xsf.

1 Like

PR opened for the minimax approach! I suggest reviewing that one (119) instead of the original approach (116).