A proposed design for supporting multiple array types across SciPy, scikit-learn, scikit-image and beyond

rgommers · November 17, 2021, 4:28pm

Hi all, Ivan Yashchuk and I - with input from a number of maintainers of all libraries involved - wrote a blog post about a topic that is hopefully exciting to many of you: making support for multiple (non-NumPy) array types in SciPy, scikit-learn, scikit-image and other such libraries a reality. Here is the post: A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond | Quansight Labs

Because this topic spans across multiple libraries at the core of the PyData ecosystem, we figured that this forum is the best place to have a discussion about the big picture here. Topics specific to one project can be moved to that project’s mailing list or GitHub repo. We’d like to pose a couple of questions we think are important to discuss, and invite feedback on the overall vision and proposed design.

Questions we’d like to cover:

Decision making on this proposal
How to delineate what is supported functionality for “multiple array types work” for a given library
A few key technical questions
Testability & maintainability

Decision making on this proposal

Meta-question regarding decision making: what would you like to see in terms of further design docs, and how do we agree those are good enough? For example, we could write a single SPEC document after the initial discussion here settles, then give that a “go ahead”, merge PRs that make sense for a project, and revisit the overall design once we have working code covering SciPy, scikit-learn and scikit-image before finalizing the SPEC.

How to delineate what is supported functionality

Let’s start with a statement that is hopefully uncontroversial: it must be easy for users to understand what parts of a library support multiple array types. It shouldn’t be a patchwork of things that do and don’t work.

There are multiple ways to go about this. What I was thinking is:

Implement something for a logical group of functionality completely, before merging it into a library (e.g. “all of scipy.ndimage is now supported”)
For very large submodule it may be more granular, but it should still make sense. E.g., “decompositions in scipy.linalg are supported, matrix functions are not” (see scipy.linalg docs)
Documentation should have a uniform way of noting whether a function or submodule supports dispatching, for example via a custom Sphinx directive that can be put in the Notes section of a docstring. Have this be part of the numpydoc format.

A few key technical questions

As already mentioned in the blog post, the most tricky question is probably what the default behavior should be for the dispatching that’s happening, and how users opt into or opt out of that behavior. Ivan will start a separate thread on this forum to discuss that, because it’s a complex topic where there will very likely be different opinions.

Choice of dispatching methods, namely __array_namespace__ and uarray. The blog goes into a bit of detail already, but focused more on what should be achieved by the overall design, rather than detailed arguments for or against these methods. So let’s give a couple of reasons. For __array_namespace__:

Let’s start by linking to NEP 37, which gives a number of reasons for why __array_function__ hasn’t been enough. __array_namespace__ is basically the worked out version of __array_module__ from that NEP.
The array API standard see docs has been designed with this portability in mind, has a well-specified API that will work for GPU execution, JIT, compiling, etc., and has a reasonably-sized and clearly specified API surface. It has support in NumPy and CuPy in their upcoming releases, and PyTorch will follow (others are still thinking/discussing).
It is not tied to a single preferred array library. For example, if we have libraries that now are built on top of PyTorch, JAX or CuPy for example, then __array_namespace__ can be adopted by them (and that’s not the case for other methods in NumPy).

For uarray:

It is the only method that meets all the goals/wishes outlined in the blog. The other multiple dispatch methods cannot meet (a) provide a way to select a different backend for the same array type, and (b) support “array duck types”.
It is already used by scipy.fft, and over the summer a PR for scipy.ndimage (unmerged) was written, as well as the start of one for scipy.linalg.
We could use another method for some modules where we are confident that we don’t need the functionality in the first bullet (if that exists), but anyway it seems we don’t want two separate systems for doing the same job.
It has other functionality that we are likely to need - things like coercing scalar/sequence inputs to arrays, which has been implemented with use cases like these in mind. The libraries that dispatch only on types look quite simple in comparison, but are missing such features.
It was the joint preference of the group of people we did some initial brainstorming and planning with - including multiple maintainers: Gregory Lee (SciPy, scikit-image), Thomas Fan (scikit-learn), Peter Bell (SciPy), and myself (NumPy, SciPy).

We should mention some of the downsides for uarray as well, these come to mind:

more lines of code per API function covered
the docs are too terse and need improving

The opt-in vs. opt-out is not really a downside of uarray at this point. For context, it was discussed a lot ~2 years ago when comparing with __array_function__, but that’s the wrong way around. We should first decide what the desired semantics should be (in the other thread on this forum, as mentioned above) and then uarray can be made to fit those semantics.

Testability

It seems like a given that this code must be testable. For the array API standard, there’s a portable test suite: GitHub - data-apis/array-api-tests: Test suite for the PyData Array APIs standard. Already used by CuPy in CI, and NumPy testing is done on the array-api-tests repo. That test suite should be able to run against any conforming library. Given that the API is well-defined and relatively small, this is probably fine as is. When putting a dispatch layer in front of a set of functions though, that needs separate testing. Testing the dispatching itself is not difficult, for example it takes ~150 lines of code for scipy.fft. The more interesting question though is: is that enough? Will it cover real-world code where one, for example, feeds CuPy arrays to SciPy and scikit-learn code?

The tentative answer, based on experience and relative lack of issues for the __array_ufunc__, __array_function__ and scipy.fft machinery, is yes - at least for dispatching on the top-level function call. For something like this:

def some_sklearn_func(x):
    # A bunch of pure Python code
    # ...
    # Some array API usage (__array_namespace__)
    # ...
    # Maybe a call to some other code dispatching via uarray
    return result

Are we completely confident that we can refactor that code and not break anything? It looks like that needs some dedicated testing - because if somewhere in the middle new code is introduced and that’s not yet covered by CuPy, then existing user code like some_sklearn_func(x_cupy) will break. This was always a problem with __array_function__ et al., and in practice it hasn’t been much of an issue as far as I know. But it remains technically undertested, so perhaps more is needed here?

Maintainability

None of us have a crystal ball, so this is a tricky topic. We suspect that the code will be relatively low-overhead to maintain, and worth it for the benefits we get - but it’s hard to be sure. So this is more of an open question; based on experience with other dispatching methods, and on the code and PRs linked above, are there general or specific worries about longer-term maintainability?

One thing that may be worth discussing is API stability. Projects implementing a backend for a particular API effectively commit to following the choices of the project which has the API in it. Including backwards incompatible changes, and API extensions. This seems to be working well for CuPy and Dask already - but for other libraries that like making their own choices, there may be some trade-offs perhaps. Or they should get more involved in the design of the base library perhaps?

stefanv · November 19, 2021, 7:52pm

Ralf, thanks very much for writing this up. This post triggered a lot of conversation, and I think we’re all gathering our thoughts before responding. Just wanted you to know that it’s been seen and is being thought about!

rgommers · November 20, 2021, 9:54am

Thanks @stefanv.

I’ve gotten a few questions in private as well, about this topic and also recently about the NumPy SIMD work - both involving a meta topic around collaboration between companies and the community. So I wanted to follow up with a few thoughts around that.

Companies are often interested in code/features/packages that help them - they typically need a good business reason to spend resources or funds. For multiple array types and the kind of dispatch/backend system we’re talking about here, that is often hardware related - support new hardware, or make widely used code run faster on their hardware. This often ends up being done in isolation or as a new project, because it’s nontrivial and time-consuming to interact with community projects. A few examples (not picking on these projects specifically, they’re just the first examples that came to mind):

RAPIDS reimplements large parts of scikit-learn, scikit-image and SciPy in separate packages to support CUDA. An NVIDIA project to support NVIDIA hardware.
scikit-learn-intelex re-implements parts of scikit-learn - an Intel project optimized for Intel CPUs.
NEP 36 was written largely because Anaconda defaults contained modified numpy (and IIRC also scikit-learn) packages for optimized performance on x86_64.

The above projects were done without much interaction with the relevant community projects. And a similar argument can be made about the earlier reimplementation of NumPy-like libraries. Sometimes for good reason - if you have big plans and a few to a few dozens of engineers on a project (even 100+ engineers in a few cases) - it’s not going to work well to route everything through a project with a few handfuls of mostly-volunteer maintainers. But it doesn’t have to be that way - in many cases it will make more sense to collaborate.

Recently I’ve seen encouraging engagement from Intel, Huawei, IBM and Apple in NumPy - with performance optimizations for the specific hardware each company cared about. This is made possible by a structure like this:

Generic infrastructure
- Backend 1 for hardware type 1
- Backend 2 for hardware type 2
- etc.

The NumPy universal SIMD framework proved to work nicely for that case, where a lot of code is general-purpose and fast enough, and in some cases specific (AVX512, Neon, VSX, etc.) extra optimizations can be added incrementally. In the case of SciPy/sklearn/skimage APIs, something similar should be possible. Where there is compiled code, it’s hardware-specific and therefore needs to be reimplemented. Where it’s Python, it can often be reused. And in the end that allows higher-level packages and end users to write code using only their familiar PyData project APIs, rather than:

if device == CPU:
   from scipy import submodule
   submodule.func(x)
elif device == CUDA:
    import cusubmodule
    cusubmodule.func(x)
# Maybe need to add ROCm, SYCL, etc. - and then deal with
# optimized CPU backends too. Too complex, so won't happen.
else:
    raise TypeError("this thing is not supported!")

(the above can also be written as if check_isnumpy(x): ... elif check_istorch(x); ... elif check_iscupy(x): ... etc.)

My hope is that if we get more “generic infrastructure + hardware-specific backends”, we will see (a) more contributions - both engineering time and direct funding - from companies to community projects, and (b) more higher-level projects that work out of the box with multiple array types. That seems preferable to that of an ecosystem where the largest tech companies continue to build their own tech stacks in isolation.

I hope the above is helpful. If it seems of interest and broader than the initial post in this thread, happy to split it off as a separate conversation.

stefanv · November 24, 2021, 12:09am

Cross-linking the scikit-image thread on this.

ilayn · November 29, 2021, 11:49pm

As we have been discussing in [WIP] ENH: uarray support for linalg.decomp by czgdp1807 · Pull Request #14407 · scipy/scipy · GitHub there are a few things I am not able to wrap my head around and that or many other details are, to my ignorant view, as overengineering without any real solutions.

To start with I’d like to make a distinction between scipy.fft backend selector and the remaning linalg, ndimage and so on. The FFT backends replace the number crunching parts of the algorithms such that the low-level calls are made either to FFTW or PocketFFT or something else that can be registered as a backend. Now this might be useful for the end user such that you can reap the benefits of both engines with the same public SciPy API. One might do the things other might not. But in any case I’m holding the array on SciPy side and throwing into which ever hole that is requested only to receive a meaningful object back that I know would be compatible afterwards.In other words, my arrays are the same NumPy arrays but the functions are chosen differently (with possibly different API).

On the linalg and others case things are very different. Not only we have to do lots of manual data munging (because NumPy either doesn’t allow for or it’s not performing well) such as memory layout handling, auto detect structure for certain operations gemm and gemv to trmm, trmv as an example), but also there is no unified way of doing this. Hence the complete workflow is depending on what is supplied as an argument, not to mention all the dtypes that are not LAPACK/BLAS compatible so they need to be upcast to float{32,64}, complex{64,128}. None of this is going to be possible unless an unknown object comes in and says “we have NumPy as a common friend. So whatever you do for it I’ll take the same”.

If this last bit is possible I completely support the proposition. However otherwise the backend registration is with all respect very clunky and not really user friendly. Because we are not selecting an engine as in the case of FFT but here we are practically saying “switch to the implementation of CuPy” which was the major implementation objection I had. This sounds like the “dispatcher” is nothing but a switch-case statement running over the backend names that it knows.This is in turn, fantastically clear and easy if the user just uses CuPy implementation under the CuPy namespace. No context managers no strange backend registration setting unsetting machinery. I have a cupy array I use CuPy linalg with a single line. No ambiguity and no chance of any interpackage bugs.

OK what would be my unicorn then?: say we are writing linalg.solve for this proposal in psedocode

def solve(a, b):
    # if the linalg functions lookup returns something
    their_solve = _check_if_implemented('solve')
    if their_solve:
         return their_solve(a, b)
    else:
        # If a, b are NumPy-likes then this will go through
        # even though SciPy has no idea what they are
        return scipy.linalg.solve(a, b)

In this example there are two separate goals, the first is checking that there is an external registry ( a dict or whatever) that the code can look up and match to pick the solution. This is relateively straightforward and it only needs a modification to that dict to make a package itself known. In fact we can put this into a decorator and be done with it in a single sweep. But I feel that things are not that simple however I don’t see from the arguments why not.

The second is the difficult part and that is what I understand from a true backend; making things walk and quack like NumPy arrays. For that I don’t have enough experience to say something meaningful so I’ll skip that but apparently that is the true NumPy API effort.

In the linked PR above, SciPy is embellished with the backend list the defaults and some uarray machinery. That is not what I mean here; the function _check_if_implemented('solve') does not need to know the details of a and b and their_solve. It just needs to compare it with the list that it has and carries on.

Say, CuPy wants to take the ownership of linalg.solve then it can run a function say

import cupy
# say cupy works already with eigh, cholesky well so no need or we won't use them
cupy.overload_scipy(skip=['eigh', 'cholesky'])

And this function can register to a dict without SciPy knowing CuPy did that, hence the possible implementations can be independent. Not saying this is my well-thought-out design, I’m just saying that independent architecture makes things much much easier. . Now when two CuPy arrays are provided and the solve is registered

a_cp = cp.array([[1, 2], [3, 4]])
b_cp = cp.array([-1, -2])

sp.linalg.solve(a_cp, b_cp)  # this will go into first branch above

Let’s continue the second branch; linalg.solve is not implemented but arrays are nonnative NumPy ducks. Since scipy.linalg.solve is going to have a rather intricate implementation very soon if accepted; what it does is the following (skipping the specifics)

Sends it to a Cython function out of three, based on their C-, Fortran-, or noncontigousness, to check if it is triangular, diagonal, hermitian, hessenberg etc. Some function runs over its entries and we get back a result.
Based on that result, we make sure that the dtype is LAPACK compatible, change its dtype if needed and send it over to another function including some error checking and then sending to some solver either via f2py or cython_lapack wrappers.
If matures, there is also a possibility to use Pythran with a different code base which complicates the matter.
If all goes OK then we return the array back to Python side and put the cosmetics on and return the result.

Another example for the actual backend implementation is the MKL/OpenBLAS and so on. The API is written in stone by the reference implementation and everyone else provide their own custom code with the same API. Can a non NumPy object would be able to quack this fluently I don’t know. But that is the threshold it has to pass.Otherwise it needs a separate solve implementation.This brings the question of will SciPy or any other package be independent enough to change internal implementations, say we added a nontrivial step to the list above and broke the second branch. What should we do next? Change all implementations of all packages? Deprecate? Communicate?

So summarizing, in this context, what I understand from a backend swap is

1- that of uarray or the array_api. (I’m not sure. I can’t keep up with the differences anymore because everything is called array something.) That is to say, whichever object I receive from any library I know that their array_likes follow NumPy’s array object and if I ask for its data or dtype or shape or the baseline memory it is always the same syntax with same properties.

2- The implementation is registered by the packages and not with SciPy. The main lookup will tell SciPy which implementation it should look for and it doesn’t have to know anything about it. Because (1) will guarantee the array properties and (2) is the package own definition so SciPy doesn’t need to keep track.

In this specific regard, I don’t think we have achieved a backend swap mechanism other than maybe calling different packages a bit more tidier and fancier with context managers happening in SciPy codebase for no apparent business case. This should in fact happen at an abstract level and all packages should adhere to that collectively. Current implementation seems way too specific on the solution but not addressing the future needs in my opinion.

Just to remedy the “damaging” statements I might have expressed above, I definintely do want this to happen and I am aware how much effort is going into this, hence my diatribe above. I wanted to express what I feel about the current implementation and what we should provide to the overall community. But at the end of the day these are my personal opinions and please feel free to dismiss them in case of obscurity.

ilayn · November 30, 2021, 12:12am

I forgot to mention but this rather long diatribe applies to any scikits and other willing packages too, that is to say any other package should be able to register their own implementations to the scikit dicts and so on. So I hope this doesn’t come across as a SciPy defense.

hameerabbasi · November 30, 2021, 9:50am

ilayn:

def solve(a, b):
    # if the linalg functions lookup returns something
    their_solve = _check_if_implemented('solve')
    if their_solve:
         return their_solve(a, b)
    else:
        # If a, b are NumPy-likes then this will go through
        # even though SciPy has no idea what they are
        return scipy.linalg.solve(a, b)

Hello, I’ll try to answer this since I designed uarray – It’s kind of a misnomer as uarray itself is just a generic back-end API, albeit designed for this kind of use. Let me try to morph your example and see how uarray, or something like it, arises naturally.

First, _check_if_implemented: It’s not enough to pass just a notion of a function, we actually need to pass everything to dispatch over, all the arrays in this case:

def solve(a, b):
    # if the linalg functions lookup returns something
    their_solve = _check_if_implemented('solve', (a, b))
    if their_solve:
         return their_solve(a, b)
    else:
        # If a, b are NumPy-likes then this will go through
        # even though SciPy has no idea what they are
        return scipy.linalg.solve(a, b)

Let me make a comment here: We want as much existing Python code using SciPy to continue working as possible, existing published packages and so forth. For this, it becomes necessary to say that solve is scipy.linalg.solve, and therefore, we have infinite recursion. To fix that, we need to abstract it out, so the code becomes:

def solve(a, b):
    # if the linalg functions lookup returns something
    their_solve = _check_if_implemented('solve')
    if their_solve:
         return their_solve(a, b)
    else:
        # If a, b are NumPy-likes then this will go through
        # even though SciPy has no idea what they are
        return _internal_solve(a, b)

Here, we run into two problems. First, we are treating SciPy specially, as an implementation, rather than an API, so let’s separate those two out. uarray has utilities to manage that, we’ll get to those.

def solve(a, b):
    # if the linalg functions lookup returns something
    their_solve = _check_if_implemented('solve', a, b)
    if their_solve:
         return their_solve(a, b)
    else:
         raise BackendNotImplementedError

But what is this _check_if_implemented? Surely, it needs a function instead of a string, plus some notion of where the function is (so it can be shared across modules, or even libraries)?

def _check_if_implemented(f, f_module, args, kwargs):
    # Query everyone who registered themselves
    # as supporting module
    result = NotImplemented
    for backend in some_registry[f_module]:
        result = backend.ask_result(f, args, kwargs)
        if result is not NotImplemented: break
    return result

Turn this into a decorator to avoid a lot of boilerplate and what you have is essentially uarray! Some loose ends to tie up:

SciPy (the implementation, not the API this time) is registered as a global backend.
The state is per-thread. Child threads or remote workers can deep-copy the state.
One can reguster a "scipy" backend to cover all of "scipy.*" (like RAPIDS).

ilayn · November 30, 2021, 10:49am

Thanks Hamer that is a very concise and clear summary. Also made me understand things a bit better. I think you summarized perfectly what I don’t feel like the right way to go about it which is

That is indeed the crux of my response. SciPy is, by design cannot be an API. It’s own implementation is not generic (especially the linalg parts but ndimage is also similar). That is to say, We have no well-established eigenvalue, linear algebra or matrix solver conventions let alone an API specification.

What we have instead is a lot of different codebases essentially calling the same routines from LAPACK flavors. Hence this is just replacing the complete solve or eig from SciPy with say dask.linalg and that’s why it is not an API and uarray is facilitating the renaming inside SciPy. There is also quite a big gap with what linalg should have provided and what it is currently providing in terms of functionality. And I think it would be a very premature spec to take current state and impose it on everyone. That’s why I don’t see this as an API but a clever renaming scheme happening inside SciPy for everyone involved.

As I mentioned, this brings no commonality to the linear algebra functionality but only replaces the SciPy namespace. I can’t still understand why it is not easier to just use CuPy or Dask linalg in the first place. There is definitely no use-case provided in this whole discussion and seems like coming out of nowhere.

In my example, the second branch (let’s take the _internal_solve indeed, my bad) accepts possibly the foreign objects and that is basically what I think should happen. That is to say, a common entry point to solve for all and not just codebase replacement.

If that codebase replacement is needed than probably a central package is a better solution say linalg_hub or uarray itself which then does this right where the switch is required instead of dissecting certain parts of SciPy modules. Then interested backend replacement users can use linalg_hub.solve with whatever backend they wish to use. In other words that central hub holds the registrations and orchestration such that all packages including SciPy can coexist without any interaction problems. By this, SciPy doesn’t have to be the hub or the API others has to painfully restructure in order to get their implementation used internally.

rgommers · November 30, 2021, 11:03am

So this is indeed a key question. The assertion I would make is that yes, SciPy does have an API that can be reused/reimplemented independently. Not for everything, we know there’s parts of the SciPy API we are not happy with (including some parts of scipy.linalg), and those should therefore be fixed first or left out. But as a generic argument that “SciPy cannot be an API”, I strongly disagree. Just look at CuPy, it reimplements parts of SciPy already and is continuing to add more: Routines (SciPy) — CuPy 13.0.0 documentation.

I can answer this later. Use cases are provided, but let me make an attempt to boil it down to the simplest and most explicit version possible.

ilayn · November 30, 2021, 11:26am

Let me rephrase; they are mimicking the function names and some of the arguments. Just in case we are on the same page; an API for me is that you send your own object to SciPy and it works as long as you abide by certain rules (which we don’t have). That is a very strict set of rules and very difficult to move.

Here we are calling dask or cupy by rerouting the import mechanism internally which is not an API at all but as I mentioned a namespace replacement for which I cannot see any benefit whatsoever as opposed to calling dask or cupy linalg.

hameerabbasi · November 30, 2021, 11:47am

I would say one would need to separate the arguments into ones that affect implementation details versus those that affect behaviour. Then one could easily choose where to implement or err, and where to ignore.

rgommers · November 30, 2021, 8:35pm

I’m not quite sure what you mean by very strict rules. It seems to me like you’re mixing API and implementation details (as also indicated by your comments on LAPACK - LAPACK is internal implementation detail invisible to the user of the API, and hence irrelevant here). What I mean by API is the interface bit only (the “I” in API), so two things:

syntax → function signature
semantics → what results do you get back given a set of inputs (e.g., svd returns 3 arrays, with a specific numerical content as given by the relevant math)

Now currently the array inputs that we accept are “np.ndarray and anything convertible to np.ndarray, as determined by calling np.asarray on it”. And then it typically returns one or more np.ndarrays. With this proposed design that gets extended with other array objects (like cupy.ndarray, torch.Tensor, etc.) - and “array type in == array type out”. Other syntax and semantics remain unchanged. And the backends are responsible for ensuring that that is the case. Or, if they don’t support something (e.g. a particular keyword), raising a clear exception.

ilayn · November 30, 2021, 9:20pm

No I think you are using the word API a bit too loosely. If you define an API specification just like a REST API as long as you satisfy certain requirements you will be getting the results, if not the API needs fixing. Example; imagine all packages mentioned above decide on certain functions to be defined in an API. So NumPy or Dask or other arrays are going to go into some linalg function func. Every array object has certain things it has ndim, shape etc etc. that will be used internally. What the API protocol would be requiring is the certain methods and properties shape, ndim, .T, and so on and only would call this this and this LAPACK functions. By that, SciPy or any other scikit do not need to know what that array was. It satisfied the requirements and the results are back in that array type. That means we will be constraining our code to use only use those common intersection of all these arrays such that they remain operational.

Sending some array and internally it finds it own library because you “registered” is just importing a library in disguise. Because internally we imported that package and used its own function on it. It doesn’t matter if you used uarray on it or manually wrote

solve = dask.linalg.solve
eig = ...

at the top of your file. Convenient? sure I like the uarray design, API? no not really.

Let me put it this way, if CuPy.solve has an option that SciPy solve doesn’t have that won’t matter because we are completely replacing the function hence they have no responsibility to be common. You can even replace solve with eig for that matter. As long as the backend user knows this it will be OK. So I can’t see how this is going to unify SciPy and scikits or whether that is the intention.

seberg · December 1, 2021, 8:27pm

I have many more comments/thoughts, but let me start with hopefully helping to add some structure for discussing further points. The discussion – in my opinion – spans three main things:

Duck-typing protocol: Implemented using the array-api (choosing it over __array_function__), to ease the writing of type generic implementations.
- Targeted at the core array functionality. Itself a form of type dispatching (2) but the array-api has a bit of a special standing compared to libraries.
- This duck-typing protocol can help with scipy.linalg being implemented so it works for (almost) all array-likes. There are subtleties: E.g. scipy.linalg should be able use it to provide a generic implementation for all CPU arrays, which are only a subset of the full array-api implementations.
Type dispatching: When generic implementations (using 1) don’t work or are slow, libraries may want to allow the array-object (or a 3rd party) to provide a type specific implementation which is dispatched to implicitly.
- An example here is “enabling” scipy.fft for cupy.
- Ideally, for the end-user, this makes it work as if there was a generic duck-typed implementation available. (E.g. __array_function__ is a duck-typing protocol for downstream use but is itself implemented as type dispatcher of NumPy functions.)
- Type dispatching is probably usually provided e.g. by cupy.ndarray itself, although likely as a distinct library (making it much like a 3rd party).
Backend selection: In some cases, we may want to explicitly swap out an implementation, usually for speed (or maybe special hardware).
- The example here is using pyfftw instead of pocketfft for scipy.fft.
- Additional features could be to swap it out context-local using a with statement or only provide a faster version for CPU backed arrays.
- Backends are typically provided by a 3rd party.

The specific proposal combines 2 and 3 into a single dispatching step (implemented by uarray). I expect there are good reasons to see them as a single step, but I think it is still good to distinguish them.

For further discussions it might be good to split up 1 and 2+3. A library that wants to provide 2 will probably want to use 1 for its own (generic) implementation. But that may be the only overlap. How duck-typing is done for the core array functionality is distinct from type dispatching and backend selection.
When discussing requirements/features, I think we should also clearly distinguish 2 and 3, even if there are good reason to do them in a single step. For example type dispatching has absolutely no need for context local with statements, but with statements are likely necessary desired for backend selection.)

rgommers · December 2, 2021, 1:51pm

Thanks @seberg, I like your structure, what you wrote all makes sense to me (even if I had a slightly different structure in my head). A few short thoughts:

Re duck-typing protocol: mostly agree, it’s about a form of duck typing for the core array functionality. The change I’d like to make to your description is that it’s not array-api vs. __array_function__ but (a) we need a well-defined API which only the array API standard provides, and (b) we need some kind of dispatch/access mechanism - where we choose __array_namespace__ over __array_function__. In principle though, the functions in the array API standard could be combined with __array_function__.

In my mind (2) and (3) are combined because it’s a single dispatch layer put in front of compiled code (unlike (1)), and hence no Python code reuse is possible. (2) isn’t really “one library, one array object”, for example JAX has multiple array objects, and PyTorch is also growing new things like MaskedTensor which may or may not be subclasses of the main array/tensor object.

I’m not sure the with statement is the critical part here (nor is it something we necessarily want to keep - see the other opt-in vs. opt-out thread); you’re asking "is it possible to unambiguously choose an implementation I think (?). In the backend selection part, we have multiple implementations of a function for the same array object - so we need some kind of rule to select the desired backend, whether a with statement, or the order in which backends were registered.

rgommers · December 2, 2021, 2:14pm

The first half of this is correct, when applied to functionality using the array API standard (but not uarray). The LAPACK part is wrong, that is exactly the opposite of what is proposed. We will not be trying to take some array object that’s not numpy.ndarray and pass it to a LAPACK function - that will not work (because LAPACK implementations are compiled code).

Non-NumPy arrays are not going to touch the inside the implementation of scipy.linalg.func, I think you are misunderstanding what uarray does (or singledispatch, multipledispatch, __array_function__, etc.).

Sending some array and internally it finds it own library because you “registered” is just importing a library in disguise.

The API and dispatch terminology is completely standard, not something I/we recently invented. When we talk about “public API” in SciPy or any other library it is about function signatures and semantics, and not about implementation details. And qualifying dispatching as “importing in disguise” is more confusing than helpful I think.

It may be more productive to jump on a call if you can spare 20-30 minutes. I really would like to separate your actual concerns (which for scipy.linalg can indeed be valid concerns, because it’s not all very polished functionality that we are 100% happy with) and requests for further clarification and use cases from an argument around terminology, and from the parts where I think you have misunderstood what is proposed.

ilayn · December 2, 2021, 8:48pm

I see. Apparently, there is quite a fixation on “implementation details” in this thread which I can’t make my point across. But if a backend registry leads to scipy calling cupy or dask for a function that is not implementation detail. That is very obviously importing dask and pushing the arguments to another library. Hence being just a common namespace. That is not considered as an API, that IS the namespace itself. Because if a library implements a different eig with the keyword squirrel in the signature. There is no reason why it shouldn’t be accepted through scipy.linalg.eig because I already registered that backend and that backend knows what to do with squirrel argument. Then the question becomes why are they even using SciPy namespace instead of properly calling the right library with the right arguments. This is not motivated anywhere at all. Not the blog post, not the uarray motivation, not this thread. I can’t find any support or even demand anywhere yet.

On the other hand if you say no they cannot use different signatures then everytime we change something in scipy.linalg, all backends will have to adjust which seems like a total disaster to me because we are not even close to a good SciPy API and we will need to change things here and there. It seems like I’m having difficulty articulating this point.

Anyways, I am not able to see the benefit nor the reasoning behind it, let alone how scikits will do this. However we discussed in the email exchange, this motion seems to have moved already forward so there is no point getting stuck on my arguments. Hence I’ll concede the discussion with no reservations. I’ll let the scikit authors and others to take over the discussion.

rgommers · December 3, 2021, 8:02pm

They should have signatures that are matching indeed. Or are a subset of what the SciPy API provides (can easily happen if functionality is incomplete, or if a new SciPy release adds a new keyword).

This is a key point. I don’t think it’s a disaster, nor do we change signatures all the time. However indeed, if SciPy devs are unhappy with a particular (set of) function(s) and plan to change them, then indeed we should not add a backend for them until those changes are made. That’s totally fair. I wouldn’t say everything is terrible in scipy.linalg though, I’m a bit more optimistic:)

To make changes to an API with a backend, not much changes:

If a backwards-compatible change is made, like adding a new keyword, this will not break a backend. The backend can be updated in a future release. If a user explicitly uses the new keyword with that backend, they will get a clear exception.
If a backwards-incompatible change is made (these are expected to be rare), the same considerations apply for SciPy as they already do today. Deprecations and changes in behavior can be mirrored in the backend.

From the blog post, see for example the paragraph with: “Parallel execution and GPU support were two of the top three priorities named by the 1000+ users in the 2020 NumPy user survey.”
A long-standing SciPy feature request from the Dask/RAPIDS team: Leveraging NEP 18 for image processing and signal processing · Issue #10204 · scipy/scipy · GitHub
Another SciPy request: Might it be worth allowing certain algorithms to be implemented using CuPy? · Issue #8758 · scipy/scipy · GitHub
The PyTorch team has expressed strong interest in making SciPy work with PyTorch tensors.
A previous attempt in scikit-image WIP Accelerate fundamental functions with OpenCL using gputools by Dalordish · Pull Request #4215 · scikit-image/scikit-image · GitHub which references an even older attempt from 2011 (Backend by holtzhau · Pull Request #14 · scikit-image/scikit-image · GitHub).
A long history of NEPs, PRs to the likes of scikit-image that go 5+

I’m sure I can find more in other issue trackers, on Twitter, etc. There’s a reason why GPU support and parallelism were the top two responses in the NumPy survey. Those are not all confused users who don’t know about CuPy, PyTorch, JAX, etc. - the problem is that if they switch away from NumPy they lose access to SciPy, scikit-learn, scikit-image, and so on. Users express that they want something like this, there is a ton of history here.

I think it’s fine to say “I don’t see the point” and also say “I still want to make changes to some functionality” (which is possible) and “I don’t want a lot of extra maintenance overhead from this” (which I’d agree with). However, I don’t think it’s fair to say there is no motivation.

stefanv · December 3, 2021, 9:36pm

Ralf, thank you for your post. This is an important and complex topic. Important, because it is often requested and can help keep the ecosystem viable in the face of new technologies being introduced, and complex because it involves coordination and fairly technical modifications to many different places.

I see three themes here:

How to override APIs in libraries/modules whole-sale.
How to override implementations for specific object types (array types, e.g.).
How to replace computational engine backends inside a library.

An example of (1) would be where NVIDIA provides a faster version of skimage.
An example of (2) would be where skimage algorithms handle dask arrays and NumPy arrays.
An example of (3) would be SciPy’s PocketFFT vs FFTW backends.

In terms of complexity & viability, (3) has been demonstrated, (1) can be done even using monkeypatching (not ideal, but still) and (2) has proved to be quite difficult.

I don’t know how easily we will be able to sort out (2). It seems to apply mainly to pure Python code, and can also be addressed, if somewhat cumbersomely, with (1). But this is where NumPy has made some attempt, and where we got stuck before. It also brings up the spectre of the cross-product of all input types, and how to handle those without an (oft inefficient) common messenger, like an in-memory array.

Thinking of whole-sale API override, as well as internal computational backend replacement, both are clearly feasible, but the devil is in the details. Questions that arise, some of which you’ve answered:

How hard do we work to ensure that API re-implementations stay in sync.
How do we ensure that computational backend replacements are accurate (testing?). And, if they’re not, is this acceptable? For numeric code accuracy is important, so users may need to be informed about the backend in play—but that is tricky in a library.
Who carries the maintenance burden? It seems clear that API overrides will be externally implemented, and that we can only expect libraries to provide the “hooks”.
Tying into (2), how do we present this to our users so that they know what’s going on during execution? Error handling, notifications of backend switching, etc.

It would be worth figuring out the expectation on the libraries. I.e., how much work will they have to do? Are we talking decorators on functions, or an import during init, to set up hooks? Or do we need to work with the backends to insure some semblance of sanity during execution of the pipeline? Do we set up large test matrices, or is this the responsibility of implementers?

I would love to discuss (2) more, but I find that a very hairy problem to reason about, and don’t quite know where to begin, especially considering the mix of Python/C code we have.

Anyway, just some initial thoughts, I’m sure there’ll be more

P.S. W.r.t. process, I am confident that we can get this done through community discussion (meetings, video calls) and a SPEC that brings together the core projects. It would be a good test for our community, too!

stefanv · December 3, 2021, 11:21pm

As an aside, here is a list of features I thought that a dispatching mechanism for skimage would need + an ensuing discussion.

It must be possible to get a log of executions, so that you can monitor which backends are being used.
All functions most be overridable (we don’t manually want to curate which functions do or do not make sense to speed up).
We determine the user facing API; this is never determined by external libraries.
All external implementations must conform to the scikit-image test suite. Not implementing full functionality (e.g., N-D, all values of a flag, etc.) is okay, in which case the plugin should raise NotImplementedException and the next plugin should be tried.
Return images must be coercible into NumPy arrays.

I still think that’s a reasonable list (but debatable, of course, and definitely incomplete).