In RFC: Naming convention for generalised ufuncs in special · Issue #20448 · scipy/scipy · GitHub, @izaid proposes a naming convention for certains pairs of functions in scipy.special
.
First I’ll give the general background. Here is a prototypical example for this kind of pair of functions:
The function scipy.special.sph_harm for computing spherical harmonics has signature sph_harm(m, n, theta, phi, out=None)
where m
and n
are array_like
s of integers giving the order and degree of the spherical harmonic respectively, and theta
and phi
are angles giving spherical coordinates for a point on the surface of a sphere.
This is an instance of a rather common situation where there are integral parameters, and the result is computed through a recurrence relation on these parameters, so e.g. computing sph_harm(m, n, ...)
requires computing sph_harm(i, j, ...)
for all 0 <= j <= n
, 0 <= i <= m
. For ufunc sph_harm
, if arrays are passed in for m
and n
, redundant work is done to recompute the recurrence for each pair of values from these arrays.
@izaid introduced a gufunc
version of sph_harm
called sph_harm_all
whose scalar kernel returns the table of all values computed up until sph_harm(m, n, ...)
. The gufunc version only takes integers for m
and n
and computes the entire table of values for 0 <= i <= m
, 0 <= j <= n
.
The gufunc version does not supersede the ufunc version, because if one only needs the results for one or a small number of (m, n)
pairs for large m
and n
, storing and returning the entire table of results will result in excessive memory use. The ufunc version does not store the entries of the table during the recurrence, only storing what is needed to produce the final result. Both versions of the function are useful.
@izaid proposes the convention of naming these pairs of functions like sph_harm
and sph_harm_all
, the all
signifying that the result is computed for all values of the input parameters computed through the recurrence for obtaining the final result. There has been some objection to all
on the grounds that it’s not clear from the name “all of what?”, but no one has been able to think of a better name, and I think this one is good enough, particularly if it is well documented, and becomes a convention for all such pairs of functions.
We have some existing pairs of functions like this, such as pbvv and pbvv_seq, but the seq
seems specific to the 1d case where there is only one parameter involved in the recurrence. (pbvv_seq
is also not a gufunc, and doesn’t take array arguments for any of its parameters).
Another suggestion has been to have only a single function e.g. sph_harm
and to change the behavior based on a keyword only flag. I don’t think there is anything inherently wrong with this approach, but @izaid has pointed out this leads to data dependent data shapes which array API standard recommends avoiding, because this can cause problems for array libraries which build compute graphs such as Dask and Jax. This settles the tie in my mind between two API options for which I have no real preference.
Feel free to post in gh-20448 if you’re interested in joining the discussion!