A proposed design for supporting multiple array types across SciPy, scikit-learn, scikit-image and beyond

Let me rephrase; they are mimicking the function names and some of the arguments. Just in case we are on the same page; an API for me is that you send your own object to SciPy and it works as long as you abide by certain rules (which we don’t have). That is a very strict set of rules and very difficult to move.

Here we are calling dask or cupy by rerouting the import mechanism internally which is not an API at all but as I mentioned a namespace replacement for which I cannot see any benefit whatsoever as opposed to calling dask or cupy linalg.

I would say one would need to separate the arguments into ones that affect implementation details versus those that affect behaviour. Then one could easily choose where to implement or err, and where to ignore.

I’m not quite sure what you mean by very strict rules. It seems to me like you’re mixing API and implementation details (as also indicated by your comments on LAPACK - LAPACK is internal implementation detail invisible to the user of the API, and hence irrelevant here). What I mean by API is the interface bit only (the “I” in API), so two things:

  1. syntax → function signature
  2. semantics → what results do you get back given a set of inputs (e.g., svd returns 3 arrays, with a specific numerical content as given by the relevant math)

Now currently the array inputs that we accept are “np.ndarray and anything convertible to np.ndarray, as determined by calling np.asarray on it”. And then it typically returns one or more np.ndarrays. With this proposed design that gets extended with other array objects (like cupy.ndarray, torch.Tensor, etc.) - and “array type in == array type out”. Other syntax and semantics remain unchanged. And the backends are responsible for ensuring that that is the case. Or, if they don’t support something (e.g. a particular keyword), raising a clear exception.

No I think you are using the word API a bit too loosely. If you define an API specification just like a REST API as long as you satisfy certain requirements you will be getting the results, if not the API needs fixing. Example; imagine all packages mentioned above decide on certain functions to be defined in an API. So NumPy or Dask or other arrays are going to go into some linalg function func. Every array object has certain things it has ndim, shape etc etc. that will be used internally. What the API protocol would be requiring is the certain methods and properties shape, ndim, .T, and so on and only would call this this and this LAPACK functions. By that, SciPy or any other scikit do not need to know what that array was. It satisfied the requirements and the results are back in that array type. That means we will be constraining our code to use only use those common intersection of all these arrays such that they remain operational.

Sending some array and internally it finds it own library because you “registered” is just importing a library in disguise. Because internally we imported that package and used its own function on it. It doesn’t matter if you used uarray on it or manually wrote

solve = dask.linalg.solve
eig = ...

at the top of your file. Convenient? sure I like the uarray design, API? no not really.

Let me put it this way, if CuPy.solve has an option that SciPy solve doesn’t have that won’t matter because we are completely replacing the function hence they have no responsibility to be common. You can even replace solve with eig for that matter. As long as the backend user knows this it will be OK. So I can’t see how this is going to unify SciPy and scikits or whether that is the intention.

I have many more comments/thoughts, but let me start with hopefully helping to add some structure for discussing further points. The discussion – in my opinion – spans three main things:

  1. Duck-typing protocol: Implemented using the array-api (choosing it over __array_function__), to ease the writing of type generic implementations.
    • Targeted at the core array functionality. Itself a form of type dispatching (2) but the array-api has a bit of a special standing compared to libraries.
    • This duck-typing protocol can help with scipy.linalg being implemented so it works for (almost) all array-likes. There are subtleties: E.g. scipy.linalg should be able use it to provide a generic implementation for all CPU arrays, which are only a subset of the full array-api implementations.
  2. Type dispatching: When generic implementations (using 1) don’t work or are slow, libraries may want to allow the array-object (or a 3rd party) to provide a type specific implementation which is dispatched to implicitly.
    • An example here is “enabling” scipy.fft for cupy.
    • Ideally, for the end-user, this makes it work as if there was a generic duck-typed implementation available. (E.g. __array_function__ is a duck-typing protocol for downstream use but is itself implemented as type dispatcher of NumPy functions.)
    • Type dispatching is probably usually provided e.g. by cupy.ndarray itself, although likely as a distinct library (making it much like a 3rd party).
  3. Backend selection: In some cases, we may want to explicitly swap out an implementation, usually for speed (or maybe special hardware).
    • The example here is using pyfftw instead of pocketfft for scipy.fft.
    • Additional features could be to swap it out context-local using a with statement or only provide a faster version for CPU backed arrays.
    • Backends are typically provided by a 3rd party.

The specific proposal combines 2 and 3 into a single dispatching step (implemented by uarray). I expect there are good reasons to see them as a single step, but I think it is still good to distinguish them.

For further discussions it might be good to split up 1 and 2+3. A library that wants to provide 2 will probably want to use 1 for its own (generic) implementation. But that may be the only overlap. How duck-typing is done for the core array functionality is distinct from type dispatching and backend selection.
When discussing requirements/features, I think we should also clearly distinguish 2 and 3, even if there are good reason to do them in a single step. For example type dispatching has absolutely no need for context local with statements, but with statements are likely necessary desired for backend selection.)

Thanks @seberg, I like your structure, what you wrote all makes sense to me (even if I had a slightly different structure in my head). A few short thoughts:

Re duck-typing protocol: mostly agree, it’s about a form of duck typing for the core array functionality. The change I’d like to make to your description is that it’s not array-api vs. __array_function__ but (a) we need a well-defined API which only the array API standard provides, and (b) we need some kind of dispatch/access mechanism - where we choose __array_namespace__ over __array_function__. In principle though, the functions in the array API standard could be combined with __array_function__.

In my mind (2) and (3) are combined because it’s a single dispatch layer put in front of compiled code (unlike (1)), and hence no Python code reuse is possible. (2) isn’t really “one library, one array object”, for example JAX has multiple array objects, and PyTorch is also growing new things like MaskedTensor which may or may not be subclasses of the main array/tensor object.

I’m not sure the with statement is the critical part here (nor is it something we necessarily want to keep - see the other opt-in vs. opt-out thread); you’re asking "is it possible to unambiguously choose an implementation I think (?). In the backend selection part, we have multiple implementations of a function for the same array object - so we need some kind of rule to select the desired backend, whether a with statement, or the order in which backends were registered.

The first half of this is correct, when applied to functionality using the array API standard (but not uarray). The LAPACK part is wrong, that is exactly the opposite of what is proposed. We will not be trying to take some array object that’s not numpy.ndarray and pass it to a LAPACK function - that will not work (because LAPACK implementations are compiled code).

Non-NumPy arrays are not going to touch the inside the implementation of scipy.linalg.func, I think you are misunderstanding what uarray does (or singledispatch, multipledispatch, __array_function__, etc.).

Sending some array and internally it finds it own library because you “registered” is just importing a library in disguise.

The API and dispatch terminology is completely standard, not something I/we recently invented. When we talk about “public API” in SciPy or any other library it is about function signatures and semantics, and not about implementation details. And qualifying dispatching as “importing in disguise” is more confusing than helpful I think.

It may be more productive to jump on a call if you can spare 20-30 minutes. I really would like to separate your actual concerns (which for scipy.linalg can indeed be valid concerns, because it’s not all very polished functionality that we are 100% happy with) and requests for further clarification and use cases from an argument around terminology, and from the parts where I think you have misunderstood what is proposed.

I see. Apparently, there is quite a fixation on “implementation details” in this thread which I can’t make my point across. But if a backend registry leads to scipy calling cupy or dask for a function that is not implementation detail. That is very obviously importing dask and pushing the arguments to another library. Hence being just a common namespace. That is not considered as an API, that IS the namespace itself. Because if a library implements a different eig with the keyword squirrel in the signature. There is no reason why it shouldn’t be accepted through scipy.linalg.eig because I already registered that backend and that backend knows what to do with squirrel argument. Then the question becomes why are they even using SciPy namespace instead of properly calling the right library with the right arguments. This is not motivated anywhere at all. Not the blog post, not the uarray motivation, not this thread. I can’t find any support or even demand anywhere yet.

On the other hand if you say no they cannot use different signatures then everytime we change something in scipy.linalg, all backends will have to adjust which seems like a total disaster to me because we are not even close to a good SciPy API and we will need to change things here and there. It seems like I’m having difficulty articulating this point.

Anyways, I am not able to see the benefit nor the reasoning behind it, let alone how scikits will do this. However we discussed in the email exchange, this motion seems to have moved already forward so there is no point getting stuck on my arguments. Hence I’ll concede the discussion with no reservations. I’ll let the scikit authors and others to take over the discussion.

They should have signatures that are matching indeed. Or are a subset of what the SciPy API provides (can easily happen if functionality is incomplete, or if a new SciPy release adds a new keyword).

This is a key point. I don’t think it’s a disaster, nor do we change signatures all the time. However indeed, if SciPy devs are unhappy with a particular (set of) function(s) and plan to change them, then indeed we should not add a backend for them until those changes are made. That’s totally fair. I wouldn’t say everything is terrible in scipy.linalg though, I’m a bit more optimistic:)

To make changes to an API with a backend, not much changes:

  • If a backwards-compatible change is made, like adding a new keyword, this will not break a backend. The backend can be updated in a future release. If a user explicitly uses the new keyword with that backend, they will get a clear exception.
  • If a backwards-incompatible change is made (these are expected to be rare), the same considerations apply for SciPy as they already do today. Deprecations and changes in behavior can be mirrored in the backend.

I’m sure I can find more in other issue trackers, on Twitter, etc. There’s a reason why GPU support and parallelism were the top two responses in the NumPy survey. Those are not all confused users who don’t know about CuPy, PyTorch, JAX, etc. - the problem is that if they switch away from NumPy they lose access to SciPy, scikit-learn, scikit-image, and so on. Users express that they want something like this, there is a ton of history here.

I think it’s fine to say “I don’t see the point” and also say “I still want to make changes to some functionality” (which is possible) and “I don’t want a lot of extra maintenance overhead from this” (which I’d agree with). However, I don’t think it’s fair to say there is no motivation.

Ralf, thank you for your post. This is an important and complex topic. Important, because it is often requested and can help keep the ecosystem viable in the face of new technologies being introduced, and complex because it involves coordination and fairly technical modifications to many different places.

I see three themes here:

  1. How to override APIs in libraries/modules whole-sale.
  2. How to override implementations for specific object types (array types, e.g.).
  3. How to replace computational engine backends inside a library.

An example of (1) would be where NVIDIA provides a faster version of skimage.
An example of (2) would be where skimage algorithms handle dask arrays and NumPy arrays.
An example of (3) would be SciPy’s PocketFFT vs FFTW backends.

In terms of complexity & viability, (3) has been demonstrated, (1) can be done even using monkeypatching (not ideal, but still) and (2) has proved to be quite difficult.

I don’t know how easily we will be able to sort out (2). It seems to apply mainly to pure Python code, and can also be addressed, if somewhat cumbersomely, with (1). But this is where NumPy has made some attempt, and where we got stuck before. It also brings up the spectre of the cross-product of all input types, and how to handle those without an (oft inefficient) common messenger, like an in-memory array.

Thinking of whole-sale API override, as well as internal computational backend replacement, both are clearly feasible, but the devil is in the details. Questions that arise, some of which you’ve answered:

  1. How hard do we work to ensure that API re-implementations stay in sync.
  2. How do we ensure that computational backend replacements are accurate (testing?). And, if they’re not, is this acceptable? For numeric code accuracy is important, so users may need to be informed about the backend in play—but that is tricky in a library.
  3. Who carries the maintenance burden? It seems clear that API overrides will be externally implemented, and that we can only expect libraries to provide the “hooks”.
  4. Tying into (2), how do we present this to our users so that they know what’s going on during execution? Error handling, notifications of backend switching, etc.

It would be worth figuring out the expectation on the libraries. I.e., how much work will they have to do? Are we talking decorators on functions, or an import during init, to set up hooks? Or do we need to work with the backends to insure some semblance of sanity during execution of the pipeline? Do we set up large test matrices, or is this the responsibility of implementers?

I would love to discuss (2) more, but I find that a very hairy problem to reason about, and don’t quite know where to begin, especially considering the mix of Python/C code we have.

Anyway, just some initial thoughts, I’m sure there’ll be more :slight_smile:

P.S. W.r.t. process, I am confident that we can get this done through community discussion (meetings, video calls) and a SPEC that brings together the core projects. It would be a good test for our community, too!

1 Like

As an aside, here is a list of features I thought that a dispatching mechanism for skimage would need + an ensuing discussion.

  1. It must be possible to get a log of executions, so that you can monitor which backends are being used.
  2. All functions most be overridable (we don’t manually want to curate which functions do or do not make sense to speed up).
  3. We determine the user facing API; this is never determined by external libraries.
  4. All external implementations must conform to the scikit-image test suite. Not implementing full functionality (e.g., N-D, all values of a flag, etc.) is okay, in which case the plugin should raise NotImplementedException and the next plugin should be tried.
  5. Return images must be coercible into NumPy arrays.

I still think that’s a reasonable list (but debatable, of course, and definitely incomplete).

1 Like

This is exactly what I meant; once you leave the NumPy array land you are left with very scarce tools. And SciPy is not answering any of your problems. Because, not to iterate myself, you are replacing the whole API so whatever backend is registered is going to be available will now process those array. SciPy has nothing to do with it because everything it has now gone and new backend is going to do this. So this is actually an argument against your premise. All your examples are asking for NumPy/SciPy itself receiving and working on these arrrays natively which is Stefan’s theme number (2).

But anyways just wanted to mention this and I need to shut up now :smile: since Stefan’s arguments are way more eloquent than mine.

Thanks for the thoughtful reply Stefan!

I agree, it’s not trivial. I think that with NumPy’s previous attempt we did have some success, because RAPIDS did adopt it and has largely happy with it. However other libraries didn’t adopt it; I think the initial approach was fairly crude because it thought only about adding a dispatch layer but didn’t consider what to dispatch on at all (it was just “everything in NumPy, even the parts that don’t make much sense”) or whether other libraries provide actually compatible semantics (and they don’t). I think we learned a lot, and spent a lot of effort solving these issues - so we now have a better chance of success.

This one I have a strong opinion on: do not support (or allow) this! It really does not make much sense, and it’s not something sensible users would even write code for. All input arrays to a single function call must be able to be handled by a single backend. And no other backend besides NumPy will accept any “foreign” array objects.

The only reason we’re even thinking about this is because of NumPy being overly flexible and accepting too much, and as a result the proliferation of asarray throughout the ecosystem, which then resulted in us recommending against using ndarray subclasses etc. (that’s not just due to `np.matrix). It’s kind of baked in by now for devs who only work with NumPy; once you start working with CuPy/PyTorch/JAX et al. you’ll gain a better sense for how much nicer it is to not auto-convert all inputs.

I wrote some more about this already, but I’d say it’s a very minor concern in practice. If you look at how much we actually deprecate in a minor release, it’s typically a small handful of things - not really a major burden for backend providers.

That’s up to the backend providers. The main reason for changes in accuracy would be things like GPU libraries defaulting to float32 rather than float64. Or bugs. But I’d say that’s their responsibility, just like it is today - inserting a backend system doesn’t change anything here.

Yes indeed. You answered your own question well I’d say. Maybe the only additional aspect is that we’d get backend providers more involved in reviewing critical new features, so they could say things like “this algorithm choice doesn’t work well on a GPU, can we change it like [this]?”. Which would be a win imho, compared to how things are working today (copy the API without any communication, then modify as needed and end up with incompatibilities).

This is the most interesting of your questions I think - there’s multiple ways to go about this. We need a good and consistent way of documenting what is supported with a backend, perhaps cross-linking in docs. And perhaps some debug mode or way to query/log what is happening indeed. This shouldn’t be too difficult to add I imagine, but it doesn’t exist today. I’m also not aware of any other library with internal dispatching doing this - so it makes sense to do, but perhaps can be done based on real-world experience rather than as up front design.

For uarray we have examples, like the scipy.fft one (implemented) and the scipy.ndimage one (PR linked in my initial post). That also includes testing via a mock backend. That should be enough, I wouldn’t expect SciPy to run GPU tests for example.

I think this is indeed the most hairy part, and is easier to discuss based on example code/PRs rather than in the abstract.

It may have to be multiple SPECs/docs. Between the blog post, forum posts, worked out LIGO example, existing backends for SciPy, the array API standard docs, etc. we already have significantly more content than would go into a typical NEP for a large feature. So it’d be good if we could sketch out what a dedicated (set of) design docs here would look like - where does it need to provide more detail, and where does it just link out to other content?

And thanks for your thoughts on process here! I know it’s a challenge.

This list makes sense to me, except for (2) - that’d be a mistake. We should only override modules or sets of functionality that we are happy with and that make sense to override. It should be coherent of course, but to add new backend functionality to things we don’t even want people to use because there’s better alternatives, or because we plan to redesign/deprecate seems counterproductive.

Thanks for your answer, Ralf. All your responses make sense to me.

I think an important part of the discussion will be to figure out what is the minimum we can do (the least flexible system) to support this proposal, implement that, and then learn before we commit ourselves too deeply on additional capabilities. So, I like all the answers where you say “let’s not do that” :slight_smile:

I’ll just clarify my point there: it wasn’t about whether a person should do it in logical chunks (you should!), but about which chunks you can override. What I was saying is that it would be limiting to only allow, say, scipy.linalg to be overridden and not scipy.ndimage. Of course, you wouldn’t include experimental APIs, but I worry about libraries choosing to expose only certain functions for override when they could not know what would be useful for practitioners to override.

Thanks Stefan

I think this point will be much easier to converge on when we make it concrete. Both scipy.linalg and scipy.ndimage are clearly important and should be included. But let’s go in a bit more detail:

  • ndimage is fine, all functionality can be included at once.
  • linalg is mostly okay, but for example the low-level functionality where you can retrieve BLAS/LAPACK function handles (see get_blas_funcs and direct interface in linalg.blas should be excluded.
  • For scipy.signal it’s harder, for example we should probably not include wavelets functions at all (they’re there for backwards compat, but users should prefer PyWavelets), and the B-splines functionality is also very questionable because there’s a much better spline interface in scipy.interpolate.

I’d say it’s fine to make such decisions. Would you agree, or would you advocate for then trying to clean up or deprecate those sets of functionality like wavelets and B-splines?

Yes, perfectly fine to define what is part of our public API! Preferably, the rest should be hidden/made private: that’s the most obvious way to signal its intent. Module getattr also makes it much easier to hide API that is technically public, but without wanting to advertise it.

The distinction seems to be: generic public API vs library/implementation-specific API.

A (maybe) different way to think about this is that we can divide functions we pass array-like things into into the categories of “care about what is inside the array-box” and “does not care about what is inside the array box”.

For some functions (e.g. those that drop to C / GPU / fortran) the function opens the box and those obviously have to be implemented on per-array-provider basis and are “easy” (but time consuming and tedious) to deal with.

There are other functions that really do not care what is in the box because they can be expressed entirely in terms of functions that also do not care are methods on the arrays are injected into the function (hence how we can smuggle in the backend-specific details). If the set of functions that gets smuggled is big enough you can get pretty far with this!

However it seems the really hard case is where we have a function that only sometimes cares what is in the box. I’m thinking of problems where you might pick a different algorithm if you are in CPU-but-main-memory land, dask-multi-machine land, or GPU land or problems where you need to specialize the implementation to the exact data structure (I am not deep enough into algorithm development to have one off the top of my head, but am confident they exist or we would not be having this discussion :wink: ).

One solution to this is the uarray / dispatch model where functions that think they might want to be over-ridden can opt-in to being hookable with a default “duck type” implementation available (if one exists, I have a suspicion that a bunch of scipy code is going to turn into numpy-specific implementations).

The other solution is what I think @ilayn is advocating for is that implementations build a scipy-identical namespace, simply import an functions that are of “do not care what is in the box” variety (assuming that scipy picks up enough of the duck-array functionality and maybe even marks those that should “just work”), and then re-implements the functions that it needs to (or I guess leave holes for the function it has not implemented yet).

I think both can be made to technically work and for scipy it is probably a wash (because scipy is big enough that it will get re-implemented), but I think the dispatch case is still a bit stronger. Consider the case where some array implementation has not left a hole in scipy. If a third party now implements that and wants to distribute it, in the case of dispatch they “just” have to use the registration mechanism (and I’m assuming the conflicts have some way of getting worked out), but in the separate namespace option they will be very strongly tempted to monkey patch :scream: their implementation into the mock-scipy namespace of the array implementation. A little bit of ceremony seems like it is worth it to me to prevent documenting “…and the way to distribute a third-party extension is to monky patch …”

I think that the argument for dispatch only gets stronger in domain specific / research group specific / grad student specific code bases where the implementation is likely to be more fragmented.

Silently overriding API should never be an option. There must always be a way for the user to see what is truly happening underneath the hood.

The biggest concern I have with duck-type implementations is that it requires us (library maintainers) to bear the brunt of ensuring that executions succeed. With the “you-replace-our-function-entirely” approach, the backend implementers deal with it. That mechanism also deals with your “specialize the implementation to the exact data structure problem”—in that we simply do not have to care how external parties achieve that.

There seems to be some concern that we will lose “control” over execution. We lose most control when we provide no hooks (and, e.g., force external implementers to monkey patch)—because then you cannot even track execution. But, lightweight hooks give us future options on improving the workflow, while providing a “sanctioned” mechanism for external implementers to participate.

Agreed with both of you that monkey-patching is bad / not a real option.

I share the concern to some extent. It may be good to lean more towards the “replace-entirely” side because of that.

I do think that functionality which is completely pure Python will be fine - because it will be using vectorized expressions only (in the vast majority of cases), and those are the ones that typically do not need any specialized implementation. Once you add algorithms that are iterative, or do global operations (e.g. median, unique) then yes you do get different implementations for GPU or distributed libraries.

Thanks Ralf and others for diving deep into this. I really appreciate your time and effort here!

I am a little skeptical about the likelihood for writing good “duck array” versions of fully-featured library code (e.g., for Scipy/Scikit-Image) with the exact same Python implementation. The new array API standard is a great tool to have to facilitate this for simple cases, but every array type has its own quirks. There would be enough compromises in re-writing non-trivial SciPy or sklearn routines like an ODE solver in array API compatible way that I would guess this would be most suitable for entirely new libraries, rather than re-implementing complete modules like scipy.ndimage via duck arrays.

I do think that such new “duck array libraries” could play an important role in the scientific Python ecosystem. To the extent we can define shared core routines, even beyond those found in NumPy, we should absolutely do so. I can imagine this playing out as optional extensions of the array API standard, e.g., a new sub-module with functions useful for building neural nets.

The other option for making a library “duck array ready” is supporting full reimplementations. I’m definitely supportive of this sort of work, but I’m not sure how much we really need standardization with a tool like uarray. For end-user use cases, just importing a separate library (e.g., from jax.scipy import ndimage) works fine and is nicely very explicit. The exception are cases where you want to monkey-patch some new backend into scipy itself, for use inside a third-party library. This can work sometimes, but probably not reliably unless the replacement is exactly compatible with SciPy, which is very limiting.