Support for array types other than NumPy

I wanted to bring this post to your attention. It discusses a plan for allowing scikit-image (and other packages) to support more than just NumPy arrays.

3 Likes

I created a repo that has a demo of dispatching to different backends with a uarray-based approach

Benchmark results and some discussion is given the README there. I mostly autogenerated the uarray multimethods on the scikit-image side using a script provided in the repository above, but it does create an additional layer of indirection that can make things a little harder for new contributors to understand. The Dask backend there is a more user-friendly way of using apply_parallel where a user would just have to register a dask_backend and never have to call apply_parallel (or the equivalent dask.array.map_blocks) directly.

We should make some corresponding demos of possibilities with the array_namespace approach that will become possible with NumPy 1.22. That approach is relatively simple to understand, but doesn’t help us with all of our functions that currently use Cython code.

2 Likes

Thanks, Greg, it is great to have a concrete piece of code to reason about. This may be worth mentioning on the main thread as well.

Cool, it is very useful to have something to try out and experiment!

Just in case it helps anyone: I tried uskimage-demo (at least the CPU backends). I ended up using your uarray-v3 branch though, and adding an empty uskimage-demo/uskimage_demo/morphology/__init__.py.

One thing that would interest me to play around with the implicit dispatching (multiple-dispatching) side of it: If the input is already a dask array, maybe we want to return a dask array, and automatically choose the dask backend?

As an example (sorry using blobs and footprints from the demo):

skimage.morphology.binary_erosion(
        da.asarray(blobs, chunks=1000), footprint)

with skimage.set_backend(dask_numpy_backend, only=True):
    skimage.morphology.binary_erosion(
            da.asarray(blobs, chunks=1000), da.asarray(footprint, chunks=100))

The first takes much longer than the second one. So even though all inputs are dask arrays, we don’t actually use the dask backend automatically. (Maybe you should not here!)

But: For the more general case, I would be very interested in having an example that allows e.g. shows how the user might pass a cupy/dask array, and get out a cupy/dask array even though skimage itself doesn’t even have that backend?

This case of automatically dispatching to Dask or CuPy backends without needing to use a context manager is being discussed in Auto-register fft backend to avoid context manager · Issue #14266 · scipy/scipy · GitHub. It involves having the library set the global backend with try_last=True so that any other registered backends would get tried first. Then if we had scikit-image call register_backend on those by default the Dask one would have been used in the first case you propose as well. I am not sure we should do that on the library side, though, or leave it up to users. As long as you have registered the backend you can avoid the explicit use of the context manager which is convenient.

A separate shortcoming of the Dask backend as implemented in uskimage_demo is that it is currently assuming the Dask array chunks are NumPy arrays on the host. Given that Dask arrays can also have CuPy arrays as chunks, we should handle that case as well. I will see about making an update for that scenario.

Another design difference between the skimage uarray branch and SciPy’s uarray use is that the SciPy case currently has set_backend, register_backend, etc. in each subpackage (i.e. there scipy.fft.set_backend and a proposed scipy.ndimage.set_backend). This makes some sense in that FFT-specific backends like pyFFTW or mkl_fft are FFT-specific, linalg backends are likely to be linear algebra specific, etc.

In the sckit-image case, though, I currently just have a single top-level skimage.set_backend that is used to set the backend for all subpackages. I assumed this is likely more convient as there are probably few cases where one wants different backends for say filters vs. morphology in the same usage scenario, but am interested if there is disagreement on that point?

fwiw and from a high level (haven’t had time to delve into the details of the proposals), I very much would like things to work implicitly with multiple dispatch by default, at least in the simple cases. It’s great that np.mean(my_dask_array) Just Works now, and it would be equally great if skimage.filters.gaussian also Just Worked in the same way.

Sorry about this, I found I had forgotten to commit a couple of the files in the more recent uarray branch! (I was running an in-place install so things were working okay when I had tested it prior to posting). I just tested that it now works for a clean install. There is still a separate issue about getting a more recent release of uarray uploaded to PyPI so pip can meet that requirement (we specify uarray>=0.8.2, but PyPI only has 0.6.0 at the moment).