A proposed design for supporting multiple array types across SciPy, scikit-learn, scikit-image and beyond

This is exactly what I meant; once you leave the NumPy array land you are left with very scarce tools. And SciPy is not answering any of your problems. Because, not to iterate myself, you are replacing the whole API so whatever backend is registered is going to be available will now process those array. SciPy has nothing to do with it because everything it has now gone and new backend is going to do this. So this is actually an argument against your premise. All your examples are asking for NumPy/SciPy itself receiving and working on these arrrays natively which is Stefan’s theme number (2).

But anyways just wanted to mention this and I need to shut up now :smile: since Stefan’s arguments are way more eloquent than mine.

Thanks for the thoughtful reply Stefan!

I agree, it’s not trivial. I think that with NumPy’s previous attempt we did have some success, because RAPIDS did adopt it and has largely happy with it. However other libraries didn’t adopt it; I think the initial approach was fairly crude because it thought only about adding a dispatch layer but didn’t consider what to dispatch on at all (it was just “everything in NumPy, even the parts that don’t make much sense”) or whether other libraries provide actually compatible semantics (and they don’t). I think we learned a lot, and spent a lot of effort solving these issues - so we now have a better chance of success.

This one I have a strong opinion on: do not support (or allow) this! It really does not make much sense, and it’s not something sensible users would even write code for. All input arrays to a single function call must be able to be handled by a single backend. And no other backend besides NumPy will accept any “foreign” array objects.

The only reason we’re even thinking about this is because of NumPy being overly flexible and accepting too much, and as a result the proliferation of asarray throughout the ecosystem, which then resulted in us recommending against using ndarray subclasses etc. (that’s not just due to `np.matrix). It’s kind of baked in by now for devs who only work with NumPy; once you start working with CuPy/PyTorch/JAX et al. you’ll gain a better sense for how much nicer it is to not auto-convert all inputs.

I wrote some more about this already, but I’d say it’s a very minor concern in practice. If you look at how much we actually deprecate in a minor release, it’s typically a small handful of things - not really a major burden for backend providers.

That’s up to the backend providers. The main reason for changes in accuracy would be things like GPU libraries defaulting to float32 rather than float64. Or bugs. But I’d say that’s their responsibility, just like it is today - inserting a backend system doesn’t change anything here.

Yes indeed. You answered your own question well I’d say. Maybe the only additional aspect is that we’d get backend providers more involved in reviewing critical new features, so they could say things like “this algorithm choice doesn’t work well on a GPU, can we change it like [this]?”. Which would be a win imho, compared to how things are working today (copy the API without any communication, then modify as needed and end up with incompatibilities).

This is the most interesting of your questions I think - there’s multiple ways to go about this. We need a good and consistent way of documenting what is supported with a backend, perhaps cross-linking in docs. And perhaps some debug mode or way to query/log what is happening indeed. This shouldn’t be too difficult to add I imagine, but it doesn’t exist today. I’m also not aware of any other library with internal dispatching doing this - so it makes sense to do, but perhaps can be done based on real-world experience rather than as up front design.

For uarray we have examples, like the scipy.fft one (implemented) and the scipy.ndimage one (PR linked in my initial post). That also includes testing via a mock backend. That should be enough, I wouldn’t expect SciPy to run GPU tests for example.

I think this is indeed the most hairy part, and is easier to discuss based on example code/PRs rather than in the abstract.

It may have to be multiple SPECs/docs. Between the blog post, forum posts, worked out LIGO example, existing backends for SciPy, the array API standard docs, etc. we already have significantly more content than would go into a typical NEP for a large feature. So it’d be good if we could sketch out what a dedicated (set of) design docs here would look like - where does it need to provide more detail, and where does it just link out to other content?

And thanks for your thoughts on process here! I know it’s a challenge.

This list makes sense to me, except for (2) - that’d be a mistake. We should only override modules or sets of functionality that we are happy with and that make sense to override. It should be coherent of course, but to add new backend functionality to things we don’t even want people to use because there’s better alternatives, or because we plan to redesign/deprecate seems counterproductive.

Thanks for your answer, Ralf. All your responses make sense to me.

I think an important part of the discussion will be to figure out what is the minimum we can do (the least flexible system) to support this proposal, implement that, and then learn before we commit ourselves too deeply on additional capabilities. So, I like all the answers where you say “let’s not do that” :slight_smile:

I’ll just clarify my point there: it wasn’t about whether a person should do it in logical chunks (you should!), but about which chunks you can override. What I was saying is that it would be limiting to only allow, say, scipy.linalg to be overridden and not scipy.ndimage. Of course, you wouldn’t include experimental APIs, but I worry about libraries choosing to expose only certain functions for override when they could not know what would be useful for practitioners to override.

Thanks Stefan

I think this point will be much easier to converge on when we make it concrete. Both scipy.linalg and scipy.ndimage are clearly important and should be included. But let’s go in a bit more detail:

  • ndimage is fine, all functionality can be included at once.
  • linalg is mostly okay, but for example the low-level functionality where you can retrieve BLAS/LAPACK function handles (see get_blas_funcs and direct interface in linalg.blas should be excluded.
  • For scipy.signal it’s harder, for example we should probably not include wavelets functions at all (they’re there for backwards compat, but users should prefer PyWavelets), and the B-splines functionality is also very questionable because there’s a much better spline interface in scipy.interpolate.

I’d say it’s fine to make such decisions. Would you agree, or would you advocate for then trying to clean up or deprecate those sets of functionality like wavelets and B-splines?

Yes, perfectly fine to define what is part of our public API! Preferably, the rest should be hidden/made private: that’s the most obvious way to signal its intent. Module getattr also makes it much easier to hide API that is technically public, but without wanting to advertise it.

The distinction seems to be: generic public API vs library/implementation-specific API.

A (maybe) different way to think about this is that we can divide functions we pass array-like things into into the categories of “care about what is inside the array-box” and “does not care about what is inside the array box”.

For some functions (e.g. those that drop to C / GPU / fortran) the function opens the box and those obviously have to be implemented on per-array-provider basis and are “easy” (but time consuming and tedious) to deal with.

There are other functions that really do not care what is in the box because they can be expressed entirely in terms of functions that also do not care are methods on the arrays are injected into the function (hence how we can smuggle in the backend-specific details). If the set of functions that gets smuggled is big enough you can get pretty far with this!

However it seems the really hard case is where we have a function that only sometimes cares what is in the box. I’m thinking of problems where you might pick a different algorithm if you are in CPU-but-main-memory land, dask-multi-machine land, or GPU land or problems where you need to specialize the implementation to the exact data structure (I am not deep enough into algorithm development to have one off the top of my head, but am confident they exist or we would not be having this discussion :wink: ).

One solution to this is the uarray / dispatch model where functions that think they might want to be over-ridden can opt-in to being hookable with a default “duck type” implementation available (if one exists, I have a suspicion that a bunch of scipy code is going to turn into numpy-specific implementations).

The other solution is what I think @ilayn is advocating for is that implementations build a scipy-identical namespace, simply import an functions that are of “do not care what is in the box” variety (assuming that scipy picks up enough of the duck-array functionality and maybe even marks those that should “just work”), and then re-implements the functions that it needs to (or I guess leave holes for the function it has not implemented yet).

I think both can be made to technically work and for scipy it is probably a wash (because scipy is big enough that it will get re-implemented), but I think the dispatch case is still a bit stronger. Consider the case where some array implementation has not left a hole in scipy. If a third party now implements that and wants to distribute it, in the case of dispatch they “just” have to use the registration mechanism (and I’m assuming the conflicts have some way of getting worked out), but in the separate namespace option they will be very strongly tempted to monkey patch :scream: their implementation into the mock-scipy namespace of the array implementation. A little bit of ceremony seems like it is worth it to me to prevent documenting “…and the way to distribute a third-party extension is to monky patch …”

I think that the argument for dispatch only gets stronger in domain specific / research group specific / grad student specific code bases where the implementation is likely to be more fragmented.

Silently overriding API should never be an option. There must always be a way for the user to see what is truly happening underneath the hood.

The biggest concern I have with duck-type implementations is that it requires us (library maintainers) to bear the brunt of ensuring that executions succeed. With the “you-replace-our-function-entirely” approach, the backend implementers deal with it. That mechanism also deals with your “specialize the implementation to the exact data structure problem”—in that we simply do not have to care how external parties achieve that.

There seems to be some concern that we will lose “control” over execution. We lose most control when we provide no hooks (and, e.g., force external implementers to monkey patch)—because then you cannot even track execution. But, lightweight hooks give us future options on improving the workflow, while providing a “sanctioned” mechanism for external implementers to participate.

Agreed with both of you that monkey-patching is bad / not a real option.

I share the concern to some extent. It may be good to lean more towards the “replace-entirely” side because of that.

I do think that functionality which is completely pure Python will be fine - because it will be using vectorized expressions only (in the vast majority of cases), and those are the ones that typically do not need any specialized implementation. Once you add algorithms that are iterative, or do global operations (e.g. median, unique) then yes you do get different implementations for GPU or distributed libraries.

Thanks Ralf and others for diving deep into this. I really appreciate your time and effort here!

I am a little skeptical about the likelihood for writing good “duck array” versions of fully-featured library code (e.g., for Scipy/Scikit-Image) with the exact same Python implementation. The new array API standard is a great tool to have to facilitate this for simple cases, but every array type has its own quirks. There would be enough compromises in re-writing non-trivial SciPy or sklearn routines like an ODE solver in array API compatible way that I would guess this would be most suitable for entirely new libraries, rather than re-implementing complete modules like scipy.ndimage via duck arrays.

I do think that such new “duck array libraries” could play an important role in the scientific Python ecosystem. To the extent we can define shared core routines, even beyond those found in NumPy, we should absolutely do so. I can imagine this playing out as optional extensions of the array API standard, e.g., a new sub-module with functions useful for building neural nets.

The other option for making a library “duck array ready” is supporting full reimplementations. I’m definitely supportive of this sort of work, but I’m not sure how much we really need standardization with a tool like uarray. For end-user use cases, just importing a separate library (e.g., from jax.scipy import ndimage) works fine and is nicely very explicit. The exception are cases where you want to monkey-patch some new backend into scipy itself, for use inside a third-party library. This can work sometimes, but probably not reliably unless the replacement is exactly compatible with SciPy, which is very limiting.

I think we also need to keep in mind what the ergonomics of using the end result of this is going to be for users not just how hard / messy it is going to be for us (library developers) to implement. I think one of the responsibilities of a library is to absorb complexity on behalf of our users so they can get on with what ever it is they are really doing when they use our tools.

A down side of the “just re-implement” is that while it is easy from the library point of view but for down stream, particularly down stream users that are writing their own domain libraries, I think we will be shifting a bunch of the burden to them.

I think something like the array api, but allowing for submodules as @shoyer suggested.

import numpy
import jax

import scipy
import scipy_jax

# tell the array classes that there are some namespace that
# work with them.  This should be recursive etc.  Maybe
# this should be automatic on import?  
numpy.array.register_extra('scipy', scipy)
jax.array.register_extra('scipy', scipy_jax)

def my_function(a, b):
    ns = get_array_function_namespace(a, b)
    return ns.scipy.some_func(a) + ns.scipy.other_func(b)

numpy.array.register_extra('my_lib.my_function', my_function)
jax.array.register_extra('my_lib.my_function', my_function)


# maybe a helper that also lets you pass the ns in if you know it!
def ns_wrapper(func)):
    def inner(*args, **kwargs, ns=None):
        if ns is None:
            ns = get_array_function_namspace(*args, **kwargs)
        return func(*args, **kwargs, ns=ns)
    return inner

@ns_wrapper
def my_other_func(a, b, *, ns):
    return ns.my_lib.my_function(a, b)

I think this avoids having too much global state (there is state stuffed onto the Array class objects but they are already global), keeps it clear(ish) when you run the code where it is coming, but keeps the details of “how do I sort out which version to use with which arrays” to be pretty contained.

There is obviously a conflict on if two libraries want to claim the same name, but we arleady have that problem at the module level in Python and seem to have mostly been able to cope with that.

Having the ns either need to be fetched or passed into every is a bit annoying, but it avoids having a process global (and/or dealing with thread + context vars) and having to put down-stream code that wants to care what array it is consuming each in their own module.

(I had a longer reply started, but trying to be shorter.)

First, I like the namespace approach (especially as a concept), but I always thought of it to be mainly useful for library authors rather than end-users. For the end-user, I am operating under the assumption that the desired end-result is:

x1 = scipy.fft.fft(dask_arr)  # uses a dask implementation
x2 = scipy.fft.fft(cupy_arr)  # uses a cupy implementation

I.e. “implicit” type-dispatching which looks much as if scipy.fft.fft had a duck-implementation that works for all inputs. For the end-user that at least seems more convenient than having to get the ns state.
If this is not what libraries want/need to provide to users that is fair, but changes the design space a lot!

To be clear: Just because we support implicit dispatching doesn’t mean we could not organize it around such a namespace, but it seems to me that it becomes largely an implementation detail.
However, I agree that such a namespace object would have great properties for libraries, since it is not just explicit but also allows to pass the explicit state around conveniently (thus NEP 37)!

Second, the namespace concept only covers type-dispatching and not backend-selection. For backend-selection some form of non-local control (and thus state) seems to have a fairly clear user story (unless we decide we do not need that):

with pyfftw:
    scipy.fft.fft(...)  # uses pyfftw (assuming input is not a numpy array)
scipy.fft.fft(...)  # uses the default version

That is a very different problem, so it is likely just as well to solve it in a fully distinct dispatching step, but we do need to add that second step somewhere.
(There should be some advantages to “merging” the two steps under a single umbrella/API. Although, the only real one I can think of is that it would be convenient to query all possible implementations we end up with in a single API. Another is as an optimization, but that actually matters we should first think twice about how we do type-dispatching efficiently.)

Cross-linking: [Draft] Engine plugin API with a "unified API" by betatim ¡ Pull Request #24826 ¡ scikit-learn/scikit-learn ¡ GitHub

2 Likes

Hi All,

Interesting discussion. We encounter the same issue/questions with dipy / cudipy

Today, what is the status of this discussion? What is the status of uarray (last commit was 2 years ago) ?

I try to track all discussion on PRs/issues in different project but it seems everything suddenly stop.

Thank you all for your feedback

Hi @skoudoro, yes there is a lot going on in many places at the moment, hard to follow along. I’d say uarray didn’t make it over the finish line, mostly because of its implementation complexity. For the most recent status, it’s probably best to look at what SciPy and scikit-learn have actually merged and decided regarding support for PyTorch & co. There are a lot of PRs, and there are docs that are hopefully not too far behind:

If you are specifically interested in dealing with compiled code within DiPy rather than Python-level / array API standard code, then:

  • SciPy is doing two things:
    • library-specific support when there’s a matching API in PyTorch/CuPy (and any day now, JAX), just calling straight to that. E.g. if you pass a CuPy array or PyTorch tensor to scipy.special.logit, it calls cupyx.scipy.special.logit or torch.special.logit directly.
    • if there’s no matching API, then: convert to numpy.ndarray (on CPU, zero-copy), go through the compiled code, then convert back with xp.asarray
  • scikit-learn is doing the second thing above as well I believe, and a scikit-learn specific pluging framework is in progress (not sure of the exact status).

If you have a more specific question I can probably point to some relevant PR or discussion.

Thank you very much for your feedback @rgommers!

I will take the time to read more, understand and experiment what scipy and scikit-learn are doing.

I might come back with question if I have any doubt

1 Like