Typing scikit-image

We discussed this at the Scientific Python Developer Summit yesterday, and realized that a “not” operator could also work; but I don’t think such a thing exists.

I found some related discussions, but none concluded:

Here is the post I am planning on making on the Python discord. Any feedback?

In scikit-image, we have many functions that accept two arguments, an image and a mask:

def operation(image, mask): ...

image and mask are both NumPy NDArrays. Swapping these two arguments will not work, and we want a way to help users avoid that class of error.

To do that, we can add Image and Mask types, and have users annotate their code. But, not everyone will want to do this, so we also want to keep supporting the case where vanilla NDArrays are passed in.

Here’s a code example of what I’d like to see:

import typing
import numpy as np

Image = typing.NewType('Image', np.ndarray)
Mask = typing.NewType('Mask', np.ndarray)

image_array = np.random.random((50, 50))
image = Image(array)

mask_array = np.random.random((50, 50))
mask = Mask(mask_array)

def zero(image : Image, mask : Mask) -> Image:
    if not image.shape == mask.shape:
        raise ValueError("Image and mask shapes must match")

    image[mask] = 0
    return image

zero(image, mask)

# Fails mypy
zero(array, mask_array)

I could include NDArray in the input types using a union, but then I can no longer enforce the mask-vs-image distinction:

def zero(image : Image | np.ndarray, mask : Mask | np.ndarray) -> Image: ...

# This should fail in mypy, but doesn't
zero(mask, image)

A possible solution we thought of is a not operator:

def zero(image : ~Mask, mask : ~Image) -> Image: ...

But from what I read implementing this is problematic.

Do you have a recommendation on how we should handle this situation?

xref: Typing scikit-image - #13 by stefanv

Drive-by comment: Adding types would be a very useful way to avoid user mistakes. In this case, if getting the order correct is a common problem for users, then making the keywords required would also be a big help. E.g.

def operation(*, image, mask):

Then a user must write operation(image=image_array, mask=mask_array) (and operation(mask=mask_array, image=image_array) also works). If a user must give the parameter name, they are much less likely to incorrectly switch the arguments.

Even making just the mask parameter name required could be useful, i.e.

def operation(image, *, mask):

I know this is probably not possible because it breaks backwards compatibility, but perhaps for new functions it could be useful.

Thanks @Warren. Yes, this is a good idea, and is on the skimage 2 roadmap. @lagru can confirm?

1 Like

Yes, we’ve been working towards making all parameters keyword-only except for maybe the first one or two where it makes sense. :slight_smile:

Yeah, in an image processing library, where the first parameter is typically an image, requiring the keyword image=my_array would be too much (on par with requiring that the sin function be written with a keyword such as sin(angle=pi/6)–nobody would want that).


Some thoughts:

  1. not Union[Mask, Segmentation, Coordinates, WhoKnowsWhatElse] is just ugly. I really hope we are not forced to go there.
  2. Having said this, you could say _Image = NewType('_Image', np.ndarray), _Mask = ... etc, then Image = not Union[...]. But I still don’t love it. For one, it would allow the type of the input to be Bananas, which is probably an error.
  3. We’ve gone down the NewType rabbit hole pretty far without very good answers, but an alternative to NewType is to use Image = Annotated[np.ndarray, {'kind': 'image'}] together with some code that would look at the annotations and make sure they either match or are missing. And, even if it’s not possible to write that code currently, it would be enough to (a) not fail with all existing code, (b) indicate intent for future improvements, and (c) inspect things at runtime, which would be useful for e.g. protocol implementation discovery.

I’d also like to add a perspective. While I’m fine with using terms such as “image”, “mask”, “labels” to make things more clear to users, I really only want these to be about the properties of arrays. If we have a function that accepts a Mask, a user should definitely be able to pass an Imageto it as long as it is of boolean type!

Everything else, would only be us making life harder for users IMO.

Is a Mask anything more than a boolean Image? :joy: But anyway there is always a safety: morphology.skeletonize(np.asarray(bool_image)). And, as always, typing is optional — so none of what we are discussing should have runtime consequences.

1 Like

Re Annotated, just (re)came across Typer, which uses Python type annotations to generate CLIs. I just saw that it supports Annotated:

it’s probably worth looking at the typer source code to see how they use Annotated and see whether it helps inspire paths forward for us.