Type hinting / annotating SPEC

jarrodmillman · February 17, 2021, 5:02pm

Many ecosystem projects are beginning to discuss how and when to implement type hinting / annotating. For example, see

Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc · Issue #7370 · numpy/numpy · GitHub
type stubs for matplotlib · Issue #17991 · matplotlib/matplotlib · GitHub
Python 3.7 Typing · Issue #3153 · scikit-image/scikit-image · GitHub
Support typing · Issue #16705 · scikit-learn/scikit-learn · GitHub
Add type annotations for Graph and DiGraph by NeilGirdhar · Pull Request #4014 · networkx/networkx · GitHub

Type hinting / annotation seems like a generally desirable feature and one that most projects in the ecosystem will need to sort out sooner or later. However, this functionality is new and seems to be evolving and improving. For developers who are not closely following the situation, it is unclear how mature this functionality is, what the current costs and benefits of implementing it now are, and what the best way to implement this functionality is. For example, does it make it easier or harder for first-time contributors? Is there anything special needed to handle shared data structures like NumPy arrays? How important is it for the ecosystem projects to implement this functionality in a similar way? Does (and, if so, how does) this impact shared tooling (e.g., numpydoc)?

Rather than having each project spend time researching this topic and independently figuring out how and when to implement it, this SPEC could provide much needed guidance about why, when, and how ecosystem projects should consider implementing this functionality. It could also pool the limited expertise across projects. This would help those experts form a community to discuss this issue and would provide maintainers who have not had the time to learn about this functionality yet a group of colleagues whose collective opinion can be relied upon.

I don’t know enough about this issue to suggest exactly what this SPEC should look like and I am not the right person to draft such a SPEC. But speaking as a NetworkX developer, I believe something like the following would be very helpful for my project and I am certain many other projects would similarly benefit.

Certain projects are investigating and starting to implement type hints / annotations. Perhaps the contributors working on those features would coauthor a new SPEC. In order for the new SPEC to be accepted, the authors would need to provide some basic details about the type hinting and how it would benefit the ecosystem (particularly focusing on how coordinating this would benefit the ecosystem). It should also identify a few projects and people who will be prototyping this functionality. After it is accepted, the coauthors from the various Core and other ecosystem projects would coordinate with one another, review one another work in the various project repositories, and distill from this experience guidance for why, when, and how other ecosystem projects should follow. Much of the actual work would take place in the individual projects prototyping this feature. But the coauthors who in the past would have been siloed in their individual projects would form a mini-community that can review one another work and leverage each other experience.

During this pre-Endorsement period, projects would decide for themselves whether they want to also join the effort to prototypes type hinting in the ecosystem. If so, they should consider asking one of their team to join the SPEC as a coauthor to help in the collaborative process. The SPEC would list the projects prototyping this feature as well as the individual contributors working on it. This will help individual projects interested in prototyping type hinting / annotating by increasing the pool of expertise available to discuss and review new PRs.

For projects not ready to prototype type hinting / annotating, it would provide a better way to respond to new PRs adding type hinting / annotating support to their project. Currently, for example, NetworkX core developers aren’t prepared to decide how we should move forward with type hinting / annotating. So when contributors submit PRs with type hinting / annotating we have to ask them to remove it. When those contributors ask what our plan is, we currently don’t have a good answer. This risks discouraging this contributors from continuing to contribute to the project. However, if a SPEC like this is accepted and is being actively worked on by others in the ecosystem, we would be able to explain to those contributors that we are waiting for the SPEC to be endorsed by some of the other Core Projects. We could also ask those contributors to review the SPEC and, if interested, suggest changes and improvements. We could even ask them to consider coauthoring the SPEC to ensure that it will take our projects needs into consideration. Then once the prototyping effort is finished and the SPEC has been endorsed by the interested Core Projects, NetworkX core development team can review the SPEC and start accepting PRs to add support for type hinting / annotating based on the plan and guidance provided by the SPEC. This would make it easier for our core developers to have confidence that they are moving in the correct direction without requiring us to do the research and prototyping ourselves. Hopefully, this would also speed up the process by which NetworkX implements type hinting / annotating. Moreover, it would ensure that type hinting / annotating is implemented in a consistent way across projects.

While in draft form, the SPEC should provide information for both projects interested in joining the prototyping effort as well as information for projects that are waiting for the prototyping effort to finalize a endorsed SPEC.

rossbar · February 18, 2021, 12:36am

There are a lot of great points above. I think type annotations are an excellent topic for a SPEC. It would be particularly valuable to have a conversation from the perspective of the scientific Python ecosystem. From the Rationale and Goals of PEP 484:

Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring.

A bit later in the document…

… third party packages would have to be developed to implement specific runtime type checking functionality, for example using decorators or metaclasses. Using type hints for performance optimizations is left as an exercise for the reader.

Though not the emphasis in the PEP, I suspect that the later would be a stronger driver for adoption amongst scientific Python packages.

Looking at the benefits and challenges from the perspective of scientific Python projects would be very valuable and (eventually) developing some sort of best-practices or procedure for consistent adoption across projects would definitely help push the pace for feature adoption.

tupui · July 7, 2021, 7:24am

FYI, we had a lengthy discussion over at SciPy about static typing.

[SciPy-Dev] Static Typing

While it’s still an open debate, the current situation is that we are allowing developers to use type hints when they want to.

stefanv · July 15, 2021, 8:43pm

Could you summarize the expectation around typing arrays? I think that was the biggest hold-up so far, but I know Bas is making progress.

kne42 · June 17, 2022, 11:07pm

I think having widespread type hints across the ecosystem could prove very useful. Especially for cases like compilers that rely on annotations such as numba, or perhaps some dispatching system. I remember NumPy creating an experimental typing library but I’m not sure on the status of that currently, but that’s what I’d imagine we’d use as standardization across the ecosystem. With standardization comes ways to more easily automate typing, such as getting it from the documentation, and perhaps, standardizing how types are represented in documentation as well.

martinberoiz · July 27, 2022, 6:54pm

Type hinting / annotation seems like a generally desirable feature

Strongly disagree

jarrodmillman · September 1, 2022, 12:51pm

@martinberoiz Thanks for your input. Could you expand a bit? Is this a personal opinion or one that represents a bigger group you’ve been working with? What specific concerns do you have?

(For some background, I personally haven’t been excited by typing and have resisted adopting it. But many of the projects I am involved with keeping getting requests for typing from different people. So when I said “Type hinting / annotation seems like a generally desirable feature” maybe I should have said something llike “Type hinting / annotation seems like a desired feature by many people”.)

JuanBC · September 29, 2022, 3:01pm

The problem of how the Python community is implementing type-hinting.

Python is OO language that use Classes to define which messages support which objects. and classes are not types.

A very simple but illustrative example

class A: ...
class B(A): ...

b = B()
assert type(b) == A # fails
assert isinstance(b, A) # works!

Now what matters in an object-oriented language is “what messages” an object receives and nothing else. Now how does that impact our design?

Let’s take as an example this code full of boilerplate in flask https://github.com/pallets/flask/blob/main/src/flask/typing.py

In particular

ResponseValue = t.Union[
    "Response,
    str,
    bytes,
    t.List[t.Any],
    # Only dict is actually accepted, but Mapping allows for TypedDict.
    t.Mapping[str, t.Any],
    t.Iterator[str],
    t.Iterator[bytes],
]

All this code refers to “objects” that support the messages I need in “response”, probably referring only to iteration. Also (python ~3.8 onwards) this is equivalent to

from collections.abc import Mapping, Iterator

ResponseValue = 
    "Response" | str | bytes | list[t.Any] | Mapping[str, t.Any] | Iterator[str]| t.Iterator[bytes]

Now we have to import not only typing but also collections.abc and probably types. And all this imports are used at test time.

Another problem of snipets like Iterator[str], is that it not only validates that the type entered is "Iterator" but that the iterator can only have strings, this is confusing, are we testing ResponseValueoriterator`?

The python type system is confusing, adds overhead to the application (because of imports) and is still immature. It is going to change and we are not sure how

JuanBC · September 29, 2022, 3:48pm

After python 3.12, the imports will be lazy, so the overhead was “almost” removes

NeilGirdhar · October 10, 2023, 3:29pm

There are two types of annotations that projects should consider: annotating just the library’s API versus annotating the API and the library’s implementation.

Annotating the API has many benefits for users:

they are documentation that is up to date (because it is checked for correctness by type checkers),
the user interface of various IDEs expose the annotations,
they allow users to type check their own code (untyped libraries make all annotations into Any), and
the annotations are machine-readable, which can allow programmatic generation of code.

Annotating the API can be done inline or with stubs. Stubs are mainly attractive to library maintainers who are hesitant to annotate their code. Inline annotations are almost universally preferred as much easier for library maintainers to make changes to code and annotations at the same time.

Annotating the library implementation is more a benefit to library maintainers than users. The main benefit that users get from this is that the API annotations are more likely to be correct if the implementation is also annotated and verified. The library authors benefit from annotating the implementation because

type checking often finds many hidden bugs and design errors,
type annotations provide an extra level of security above testing, and
development is a easier because you can run the type checker as you go to get an idea of what needs to be done next.

Annotating the implementation can only be done inline. It can induce developers to restructure code (e.g., replacing the pattern of disambiguating types by EAFP using exceptions with a pattern of LBYL using is-instance checks, which is more robust). It can induce developers to fix Liskov design errors. Of course, there’s a cost to this churn.

Even though I personally am a big proponent of type annotations, I hope I was able to give a fairly neutral overview.