Typing scikit-image

stefanv · June 5, 2024, 7:12pm

We discussed this at the Scientific Python Developer Summit yesterday, and realized that a “not” operator could also work; but I don’t think such a thing exists.

I found some related discussions, but none concluded:

Type intersection and negation in type annotations - #43 by PythonCHB - Ideas - Discussions on Python.org
Possible to distinguish between Sequence[str]/Iterable[str] and str? · Issue #256 · python/typing · GitHub

stefanv · June 5, 2024, 7:38pm

Here is the post I am planning on making on the Python discord. Any feedback?

In scikit-image, we have many functions that accept two arguments, an image and a mask:

def operation(image, mask): ...

image and mask are both NumPy NDArrays. Swapping these two arguments will not work, and we want a way to help users avoid that class of error.

To do that, we can add Image and Mask types, and have users annotate their code. But, not everyone will want to do this, so we also want to keep supporting the case where vanilla NDArrays are passed in.

Here’s a code example of what I’d like to see:

import typing
import numpy as np

Image = typing.NewType('Image', np.ndarray)
Mask = typing.NewType('Mask', np.ndarray)

image_array = np.random.random((50, 50))
image = Image(array)

mask_array = np.random.random((50, 50))
mask = Mask(mask_array)


def zero(image : Image, mask : Mask) -> Image:
    if not image.shape == mask.shape:
        raise ValueError("Image and mask shapes must match")

    image[mask] = 0
    return image


zero(image, mask)

# Fails mypy
zero(array, mask_array)

I could include NDArray in the input types using a union, but then I can no longer enforce the mask-vs-image distinction:

def zero(image : Image | np.ndarray, mask : Mask | np.ndarray) -> Image: ...

# This should fail in mypy, but doesn't
zero(mask, image)

A possible solution we thought of is a not operator:

def zero(image : ~Mask, mask : ~Image) -> Image: ...

But from what I read implementing this is problematic.

Do you have a recommendation on how we should handle this situation?

xref: Typing scikit-image - #13 by stefanv

Warren · June 5, 2024, 8:00pm

Drive-by comment: Adding types would be a very useful way to avoid user mistakes. In this case, if getting the order correct is a common problem for users, then making the keywords required would also be a big help. E.g.

def operation(*, image, mask):
    ...

Then a user must write operation(image=image_array, mask=mask_array) (and operation(mask=mask_array, image=image_array) also works). If a user must give the parameter name, they are much less likely to incorrectly switch the arguments.

Even making just the mask parameter name required could be useful, i.e.

def operation(image, *, mask):
    ...

I know this is probably not possible because it breaks backwards compatibility, but perhaps for new functions it could be useful.

stefanv · June 5, 2024, 11:22pm

Thanks @Warren. Yes, this is a good idea, and is on the skimage 2 roadmap. @lagru can confirm?

lagru · June 5, 2024, 11:24pm

Yes, we’ve been working towards making all parameters keyword-only except for maybe the first one or two where it makes sense.

Warren · June 5, 2024, 11:34pm

Yeah, in an image processing library, where the first parameter is typically an image, requiring the keyword image=my_array would be too much (on par with requiring that the sin function be written with a keyword such as sin(angle=pi/6)–nobody would want that).

jni · June 6, 2024, 8:15am

Some thoughts:

not Union[Mask, Segmentation, Coordinates, WhoKnowsWhatElse] is just ugly. I really hope we are not forced to go there.
Having said this, you could say _Image = NewType('_Image', np.ndarray), _Mask = ... etc, then Image = not Union[...]. But I still don’t love it. For one, it would allow the type of the input to be Bananas, which is probably an error.
We’ve gone down the NewType rabbit hole pretty far without very good answers, but an alternative to NewType is to use Image = Annotated[np.ndarray, {'kind': 'image'}] together with some code that would look at the annotations and make sure they either match or are missing. And, even if it’s not possible to write that code currently, it would be enough to (a) not fail with all existing code, (b) indicate intent for future improvements, and (c) inspect things at runtime, which would be useful for e.g. protocol implementation discovery.

lagru · June 7, 2024, 10:03pm

I’d also like to add a perspective. While I’m fine with using terms such as “image”, “mask”, “labels” to make things more clear to users, I really only want these to be about the properties of arrays. If we have a function that accepts a Mask, a user should definitely be able to pass an Imageto it as long as it is of boolean type!

Everything else, would only be us making life harder for users IMO.

jni · June 8, 2024, 7:55am

Is a Mask anything more than a boolean Image? But anyway there is always a safety: morphology.skeletonize(np.asarray(bool_image)). And, as always, typing is optional — so none of what we are discussing should have runtime consequences.

jni · June 26, 2024, 5:26am

Re Annotated, just (re)came across Typer, which uses Python type annotations to generate CLIs. I just saw that it supports Annotated:

it’s probably worth looking at the typer source code to see how they use Annotated and see whether it helps inspire paths forward for us.

saulshanabrook · July 23, 2024, 6:10pm

That looks like a good post to me! You could also tag Alex Waygood on it since he had a few thoughts in person about it.

saulshanabrook · July 25, 2024, 8:22pm

I wanted to circle back around to the idea that maybe it’s fine if someone is using a static type checker with scikit-image, then they are forced to wrap any ndarrays that are passed in with the proper types, i.e. Image or Mask.

It requires them to be explicit about the semantics of the array they are using if they are using static type checking. It seems hard to know if this would be overly laborious or problematic without just trying it and seeing if anyone complains?

It reminds me a bit of this question which is asking for a way to not allow certain literal strings, but still allow a string that is unknown statically. It’s kinda the same question you have here, you want to allow the parent type ndarray but not allow certain subtypes of it.