Following NumPy and SciPy, pandas modified the __module__ values on various objects (mostly classes and functions) to point to the public API location rather than where they are defined in code. It was then realized that this stopped many of our doctests running, which occurs when the __module__ of the class disagrees with the __module__ of the class methods as the latter were not modified. For more technical details here, see the references at the bottom.
Even though we were able to hack around doctest discovery, the modification of __module__ breaks tools that rely on this dunder to point to the code where the object is defined. For example, the source link in the pandas doc no longer works; e.g. DataFrame now points to pandas/__init__.pyrather than pandas/core/frame.py. Various NumPy objects (such as ufunc) also suffer from this. For the doctests themselves, the stdout used to point to the location where the doctest is defined upon failure but now no longer can - you must grep the codebase to locate the doctest.
Due to this, I think it would be good to seek an alternative solution to the modification of __module__. I think we’ll need to propose a change to Python itself, but I wanted to first make and refine the proposal here (including consideration of any other orthogonal proposals).
My proposal is to add two new, optional, dunders __public_module__ and __public_qualname__. The behavior is as follows.
- When not explicitly added,
__public_module__will use its parent’s__public_module__(e.g. a class attribute falls back to the class) when one exists and will use__module__when it does not. - When not explicitly added,
__public_qualname__will use the object’s__name__in conjunction with its parent’s__public_qualname__if one exists. - The determination of both
__public_module__and__public_qualname__is done at runtime.
Tools that want to surface the user-facing location of the object (e.g. REPLs, docs) will use __public_module__ and __public_qualname__, whereas tools that required the location the object is in code will continue to use __module__ and __qualname__.
The reason for the use of the parent’s dunder is so that projects do not need to modify every attribute of a class in order to point to a public API location. I also think this should be done at runtime to not have any impact on import time, as I believe the performance impact in cases where this is used (REPLs, docs) is negligible.
Part of this proposal is to not modify the behavior of the default __repr__. Specific tools will need to opt-into using __public_module__ and __public_qualname__which is additional effort, but I do not think we should change the behavior of __repr__ for this.
Though the original issue is fully solved with __public_module__, @mbussonn identified that there can be cases where the name of the object is changed, giving rise to the addition of __public_qualname__.
References:
- Original pandas PR: DOC: Run all doctests by rhshadrach · Pull Request #62988 · pandas-dev/pandas · GitHub
- Corresponding scipy-doctest PR: ENH: do not skip methods of objects with manually tweaked __module__ by ev-br · Pull Request #214 · scipy/scipy_doctest · GitHub
- Discussion on IPython: Proposal to add `__public_module__` · Issue #15112 · ipython/ipython · GitHub