Naming convention for generalized ufuncs in special

In RFC: Naming convention for generalised ufuncs in special · Issue #20448 · scipy/scipy · GitHub, @izaid proposes a naming convention for certains pairs of functions in scipy.special.

First I’ll give the general background. Here is a prototypical example for this kind of pair of functions:

The function scipy.special.sph_harm for computing spherical harmonics has signature sph_harm(m, n, theta, phi, out=None) where m and n are array_likes of integers giving the order and degree of the spherical harmonic respectively, and theta and phi are angles giving spherical coordinates for a point on the surface of a sphere.

This is an instance of a rather common situation where there are integral parameters, and the result is computed through a recurrence relation on these parameters, so e.g. computing sph_harm(m, n, ...) requires computing sph_harm(i, j, ...)for all 0 <= j <= n, 0 <= i <= m. For ufunc sph_harm, if arrays are passed in for m and n, redundant work is done to recompute the recurrence for each pair of values from these arrays.

@izaid introduced a gufunc version of sph_harm called sph_harm_all whose scalar kernel returns the table of all values computed up until sph_harm(m, n, ...). The gufunc version only takes integers for m and n and computes the entire table of values for 0 <= i <= m, 0 <= j <= n.

The gufunc version does not supersede the ufunc version, because if one only needs the results for one or a small number of (m, n) pairs for large m and n, storing and returning the entire table of results will result in excessive memory use. The ufunc version does not store the entries of the table during the recurrence, only storing what is needed to produce the final result. Both versions of the function are useful.

@izaid proposes the convention of naming these pairs of functions like sph_harm and sph_harm_all, the all signifying that the result is computed for all values of the input parameters computed through the recurrence for obtaining the final result. There has been some objection to all on the grounds that it’s not clear from the name “all of what?”, but no one has been able to think of a better name, and I think this one is good enough, particularly if it is well documented, and becomes a convention for all such pairs of functions.

We have some existing pairs of functions like this, such as pbvv and pbvv_seq, but the seq seems specific to the 1d case where there is only one parameter involved in the recurrence. (pbvv_seq is also not a gufunc, and doesn’t take array arguments for any of its parameters).

Another suggestion has been to have only a single function e.g. sph_harm and to change the behavior based on a keyword only flag. I don’t think there is anything inherently wrong with this approach, but @izaid has pointed out this leads to data dependent data shapes which array API standard recommends avoiding, because this can cause problems for array libraries which build compute graphs such as Dask and Jax. This settles the tie in my mind between two API options for which I have no real preference.

Feel free to post in gh-20448 if you’re interested in joining the discussion!

Just to mention one particular case about the array API asking for not changing output arguments. I am not particularly in agreement with that one.

If we start adding new functions, just because output argument is changing shape, that’s not Python anymore. That’s pretty much C++ or any other language that we ran away to seek refuge in Python.

In fact that’s a quite unrealistic demand coming from array API and I am starting to think that the array API did not consider the array consuming libraries in focus as much as it did the array generating libraries.

In this particular case, since the plan is to separate the specfun parts of scipy.special, we can still use a proper shape changing public API but the underlying compiled code can use two separate functions.

In short, this is too restrictive on the rest of the SciPy users just because the intersection of (JAX && SciPy && Array API) users wants to optimize the code.

Just to mention one particular case about the array API asking for not changing output arguments. I am not particularly in agreement with that one.

Yeah, I’m not sure if avoiding input dependent output shapes is a particularly strong argument, but in the absence of a strong preference between separate functions or a keyword argument, I think it’s reasonable to use it as a tie breaker.

I think the most important thing is to have a consistent convention for naming these two versions of a function. The separate function convention already exists to distinguish between these two versions of a function in SciPy, for pbdn, pbdv, pbvv, pro_cv, obl_cv, vs pbdv_seq, pbdv_seq, pbvv_seq, pro_cv_seq, and obl_cv_seq. @izaid suggests _all because seq sounds like it’s specific to one dimension. I think the idea is we could for example create pbvv_all and make pbvv_seq an alias for it (and perhaps eventually deprecate this alias).

@ilayn, if we go with the keyword argument approach, would you recommend, for example, adding the keyword argument to pbvv to expose the behavior that now comes from pbvv_seq?

Also, I want to hear about the benefits of the keyword arg approach, I don’t have a strong preference, but you might have really good reasons to prefer it that seem obvious to you, which aren’t obvious to me because I’m not a Python API expert. I think the arguments in favor of the separate function approach can be summarized as

  1. Seeking to avoid output dependent shapes for the sake of Jax, Dask, etc.
  2. This convention already exists in special for some functions, just with different function names, and no one has complained about it.
  3. We have a champion in @izaid who is putting in the work to make both versions of these functions available when relevant and who has put a lot of thought into developing a consistent API, who prefers the separate functions approach.

Taken on there own, I don’t think any of these things would be enough to decide based on, but if both options are valid choices and neither is particularly better than the other, these factors tilt things in favor of separate functions for me.

If this happens in the compiled code and does not bubble up to the public API, I think nobody can object to it. My objection is strictly scoped to public API.

However I would also consider the possibility of using actual names and not the 8.3 naming scheme from the 70s because there were no other options. I have no idea what any of these mean anyway and attaching seq or all is not making it any better. I think this whole special library is just a giant cryptic naming mess old systems forced us into a learned helplessness.

There is nothing stopping us naming things properly especially in this multi consumer separate library. What is pbvv or better pbwa anyways? What’s wrong with parabolic_cylinder_vv or whatever the typical verbal name for it is? What is a in pbwa? Why is everything 4 letter? Is this expected? And so on.

I think (1) is really not a big deal for us yet as there are many other places that Jax and Dask won’t be able to operate anyways. In (2) and elsewhere this indeed happens but SciPy is very heterogeneous due to its nature hence not sure if we can use that as an argument for the API.

Item (3) definitely correct and seems to me the important item however note that the mess that got us in was making hasty decisions and getting these libraries into this state. So I don’t think we need to hurry this much just after we got rid of the F77 mess. It seems to me that we have the tell-tale signs of programmatic errors of adopting things without giving a serious thought about the big picture yet I must say. mdspan is another one; I do think C++ is just a mistake as big as F77 that CUDA folks are making. but like I said I don’t have any dog in that race and I have other hills to die on.

How many functions we will enable for this _all behavior? If we have 20 more _all functions, are we going to add all of them? Who is going to add them? How much work that is? How many exceptions? Which ones are amenable to such generalizations? Are they all possible to be named as _all ? Are they all generalizable to nd or limited to some number?

If we do not answer all these things, then and after three, four of them we realize there are mistakes then we cannot go back because they are already in. Like @izaid is experiencing, I could see the original authors made such decisions in F77 code but they never managed to fix because it was too late.

This also happens to be what I do for living in large enterprises, fixing all kinds of “Giraffe’s recurrent laryngeal nerve” structures, often caused by not spending sufficient time on the API design.

TL;DR : I am not particularly against to any particular naming scheme if it makes sense to everyone in that domain.

But changing behaviour by keyword is pretty much a common practice especially think of eigenvectors or singular vectors. If you don’t want them you can turn them off to save time and the output is [] array. Nobody is also questioning them and Jax, Dask will definitely have a problem with that because that behaviour is not going anywhere anytime soon. And this is already in their catalogue returning Array|SVDResult

Is the shape-dependence any different from, say, sum?

In this case, for an API that covers both use cases, the logic to determine the output shape is straightforward. It feels like our job is to create an API that makes sense, not to be overly concerned by how the API will be implemented. (Sure, don’t be obnoxious in making it harder for them, but here the _all convention looks confusing from a user perspective.)

Thanks for the example. I looked further into Jax, and found np.sum has no such issues. The examples they give for data dependent output shapes are np.unique, and np.nonzero, and they solve this by adding a size parameter. No such parameter is needed for np.sum. So I think Jax shouldn’t actually have an issue with a keyword that changes the behavior of the function for these cases. Maybe it would be slightly more work for them? I think we can probably strike the data dependent output shape argument from the list.

So I’ve been thinking about this a lot, as the primary person who is working through these issues. It matters to me to get a good API for special, and especially to correct (as well as we can) what has been ignored and left unresolved in the past. To be honest, I have considered both keywords or not, and I remain open to both.

I think part of the issue comes from wanting to use ufuncs directly. With keywords, essentially what we’re doing is simply using the keyword value to select a ufunc. Is it better to hide the ufuncs behind these keywords or use them directly? I don’t know.

If we’re going to do a keyword, I’m thinking of something like mode='all' or mode='at'. It would default to mode='at'. Even in this case, all we are really going to do is something like:

    if (mode == 'all'):
        return _sph_harm_all(...)
    if (mode == 'at'):
        return _sph_harm_at(...)

where _sph_harm_all and _sph_harm_at are ufuncs. They are different ufuncs, and have to be so as they are actually doing quite different things with the values, even if the computation is based off the same recurrence. There is actually no shared behaviour at the Python level.

Another thing to keep in mind is that, for the moment, all the special functions based off recurrence relations share a property. That is that, their signature can alternate either specific values of indices or a maximum value of the index to compute. For the spherical harmonics, sph_harm_all(m, n, ...) computes an array of size (2 m + 1, n + 1, ...) for all m and n whereas sph_harm_at(m, n, ...) computes an array at the values given in the index arrays m and n. So the number of arguments and their order remains unchanged in either case, even though for sph_harm_all we have single integers for m and n and integer arrays for sph_harm_at.

It may be there are special functions for which we can’t simply substitute an integer indicating size for the indices, and the number of arguments or their order changes between the functions. In this case, a keyword makes less sense. But also I don’t have a ready example of this.

We talked about this before, but I’m just bringing it to everyone’s attention. Whether we can change functions in special to not be ufuncs was something that came up before when adding an experimental option to make special dispatch to different array API backends. The original PR was gh-19023, and there was a follow up gh-19252 which didn’t get merged because the discussion was unresolved. There was also mailing list thread here. I think the general consensus was leaning towards it being OK to make some things in special no longer ufuncs, but important ufunc methods and attributes should be supported in the wrappers, and that we should update the documentation here to clarify the status of functions as ufuncs.

Thanks for pointing out those references. Another point I’m going to clarify here is that the ufunc attributes of the “all” functions versus the “at” functions are different. In particular, the “at” ufunc accepts a different number of arguments then the “all” ufunc. So if we want to preserve ufunc attributes on wrappers, using a keyword will not allow us to do so.

It’s not about JAX or a particular library (which may or may not work, if it does they had to put effort into it), it’s a general design principle. For example Numba may have problems too, or Mypy. Adding type annotations is something that is pretty close to orthogonal to adding support for other arrays. But it runs into the same type of issues when you add a change_output_types=... type keyword. You can look at the number of @overload’s for np.unique in the NumPy code base for an example - and that still isn’t actually covering the full semantics of the return_xxx keywords but only literal True/False values.

tl;dr just don’t use such keywords if you have a choice, it’s considered bad practice nowadays

Agreed with that assessment. However, if there are two otherwise roughly equivalent choices, then “don’t break it” is a good rule clearly. Ufuncs also have some nice extra properties, so if it comes for free, then why not:)

+1 there are a lot of really bad names in special. I’m not worried about things like gamma* having some inconsistencies (“some kind of gamma function” is okay) but a name like pbvv is meaningless so if that could change to a descriptive name if the function is touched anway, that’d be great.

Sorry, a scipy issue, but this whole thing may need a bit clearer cut explanation/guidance of when this is such a bad thing.
(Not saying that I wouldn’t generally avoid such flags, but when it is really obvious to prefer multipel functions.)

To give some examples:

  • I would not want to split np.quantile(method="...") into 9 functions (or even 2 for discrete and not because it matters for the result dtype!).
  • axis= and keepdims= must also be literals/constant to deduce result shapes. Presumably, numba uses the constant information here.

There is a trade-off:

  • Potential API bloat/confusion, because mlutiple functions that do almost the same things are also not great (and it only works if you are limited to maybe 3 choices).
  • Accepting that typing or JAX/Dask/… will warn or error if you don’t use literals/constants.

Now, in many cases, two functions are just clearer maybe, but most users use literals, unless that is also problematic?
Those that don’t use literals can avoid any warning by either making it a true literal (probably a cleanup) or writing if cond: func(cond=True) else: func(cond=False) which has is really the same as if cond: func_cond() else: func_nocond().

It feels like most of this is because we don’t like to accept the second bullet point in libraries who copy or need to understand API. I understand that dislike, but I am not sure we are doing anyone much of a favor compared to, say, adding guidance somewhere to say: nobody should expect numba/typing to support evertyhing in these cases.

1 Like

A rule-of-thumb for keywords may be for cases where they can select in and out different parts of an algorithm in a function. I think that is often where they get used, and why we are used to them and why we like them

In the particular examples being discussed here, the only thing we are doing is using them to select one complete function or another with no shared behaviour. That is what marks a distinction.

Thanks @rgommers and @seberg for weighing in. I’d like to consider this matter settled then. Thanks @ilayn for providing counterarguments, helping to make sure the separate function approach can stand up to scrutiny.

Also, I’m a big +1 for replacing inscrutable special function names like pbvv with names that are more descriptive.

@steppi Just so we’re all on the same page: how do you consider the matter to be resolved?

@seberg’s post makes a lot of sense to me. You don’t want to have a keyword argument change output types, or number of outputs. But, e.g., controlling the number of eigenvalues returned etc. should be perfectly acceptable.

Sorry, I tried to choose my words carefully when I said “I’d like to consider this matter settled.” I don’t consider it settled yet, but was hoping there would be no further objections to the two function pair approach after @rgommers’ comments.

My updated understanding, is that this case we are considering a convention to add a keyword argument to switch between two separate but related ufuncs, with the understanding that such wrappers for ufuncs should preserve important ufunc methods and attributes, but with the caveat @izaid mentioned that these attributes are different between the two ufuncs in question. Going with the keyword approach seems to introduce extra complexity on the implementer’s side with little added benefit, so my hope is that we can just all agree to accept the two function approach, where we get the ufuncness for free.

Consensus is easier than agreement, fortunately :slight_smile: Given the complexity of the switching, I agree that we should move forward with that approach.

If you’re still nameshedding, you may consider sph_harm_complete or sph_harm_full, to indicate that you are calculating the entire set of basis vectors.

1 Like

Consensus is good, and regardless of what we choose, the end result will be a big improvement compared to what we have in special now. Thanks.

Those are good names too. @izaid had suggested this list: _all, _seq, _table, _array, _multi, and _full. We had _full already but _complete is another good one to consider. If we’re moving forward with the two function approach, deciding on this suffix is next. I think from that list, I’d be happy with any of _all, _table, _full, or _complete.

If I understand the discussion correctly, the keyword matter is about changing the shape of the resulting array not the type or the number of output argument. I would also be against that should that be the case.

If we go with any of these, is it going to apply to the rest of the catalogue? Because then I think lpmn, clpmn (terrible names) are in the line. So if we select complete is this going to be a suitable suffix to all of them?

Yes, whichever name we go with would apply to the rest of the catalogue. I think having terrible names which we really should replace anyway presents an opportunity too. Right now clpmn has the behavior of clpmn_complete, and instead of the pain of trying to make clpmn have what @izaid is calling the at behavior and making clpmn_complete be the name of the function with the existing behavior, we could introduce new, sane, names for all of the functions in this catalog, and add them in regular and _complete pairs. We could then eventually deprecate the inscrutable names for these functions.