Thanks @kandersolar. I think a nice way this could be done is to change std::complex<double> z here xsf/include/xsf/lambertw.h at 163e4d43bc4afa7f95701f30725b0aede420ef3b · scipy/xsf · GitHub to a generic template parameter T z, and keep basically the same implementation, using an if constexpr (std::is_floating_point<T>::value) check anytime the behavior needs to be different for real vs complex inputs. cevalpoly could be replaced with an evalpoly that has real and complex overloads, and wraps polevl from xsf/cephes/polevl.h in the real case. That should get us there pretty quickly. Then it would be straightforward to adjust the ufunc generation in SciPy to add the real overload.
What’s currently lambertw could be renamed to lambertw_impl then and placed in the detail namespace, and then we could add explicit float, double, std::complex<float> and std::complex<double> overloads for lambertw, with the single precision float and std::complex<float> overloads casting to the respective double precision types before computing, and then cast the result back to float, just as the std::complex<float> overload is doing now. Not making the template public and shipping fixed overloads will help prevent ambiguous template resolution problems.