SPEC 1 — Lazy Loading for Submodules

Recommends lazy loading functionality for an easily-accessible namespace, but without compromising performance.

I think this is missing an important piece of the history with respect to scipy. Some of the earliest releases may have imported all of the sub-modules in scipy/__init__.py, but we very quickly moved to a lazy import mechanism (PackageLoader). We eventually dropped the lazy import mechanism because it failed too frequently in very confusing ways, especially at interactive prompts. So the history was “Greedy Imports → Lazy Imports → No Imports”.

I’m sure the modern tooling is better, as the Python import system has incorporated new functionality that we can rely on, but that should be acknowledged as something that might let us go back to something that we once abandoned.

1 Like

Tensorflow may also be a useful precedent here, e.g., see:

https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/api_template.init.py

In the past, my experience with the lazy module loader in TensorFlow was not great (it placed restrictions on how you could do imports) but that seems to have been fixed now.

Still, there are lots of potential subtle incompatibility issues. For example, the editor in Google’s own Colab IPython notebook environment doesn’t recognize some TensorFlow sub-modules as valid:
image

Thank you, Robert. I wasn’t aware (or forgot about?) this part of the SciPy history. The implementation I have now is very simple, and does not make use of any magic other than overriding __getattr__ on the module (which is the relevant development on the Python side). My hope is that this would mean a low likelihood of breakage.

After a suggestion by Jon Crall, I will also add an environment variable to enable greedy importing, for debugging purposes.

Thanks for making me aware of TensorFlow’s approach, Stephan.

I can understand how installing lazy modules can cause all sorts of issues with editors. We started that way, but the latest implementation is a very simple override only of __dir__ and __getattr__. This means that editors can introspect the way they normally would, and that they would encounter the same objects they would with non-lazy imports (i.e., no proxy objects).

I have tested the skimage PR with IPython, but not yet with any of the other editors. If you have one of those set up and could give it a try, that would be great.

Also, would you like me to mention this in the SPEC, test the mechanism in some specific way, or are you expressing concern about the approach overall?

I commented primarily to clear up the history being told in the SPEC.

I’m not especially looking forward to reading code invoking scipy.linalg.whatever() all over the place (though I am looking forward to interactively tab-completing my way to the skimage functions I want but never remember if they are in skimage.filters or skimage.morphology).

@rkern I have started a PR to correct the history. I’ve used your words directly—is that OK? Maybe you would consider being co-author on the SPEC and letting me know of any other context I am missing.

W.r.t. the scipy.linalg.whatever, hopefully people will use sp.linalg.whatever at least. But, also, we could write a recommendation that submodules still be imported explicitly, unless there is a conflict of sorts (numpy.linalg, networkx.linalg, etc.).

Lazy loading has been merged into scikit-image, so now we have a living experiment. It is also integrated into Napari.

3 Likes

I updated the RAPIDS cuCIM library to also use lazy loading in v22.02.00

1 Like

Thanks for sharing, Greg! I don’t see it mentioned in the release notes from 16 days ago—is that the version I should be looking at?

Yes, it only shows up there indirectly via “Update cucim.skimage API to match scikit-image 0.19”. The lazy loading is at the top level, though, and not just in the cucim.skimage package.

FYI the Cinder project from Meta is looking to work on upstreaming their lazy import mechanism.

1 Like

Thanks for the heads-up @brettcannon!

Our implementation has now also been refactored into its own package, and lives at GitHub - scientific-python/lazy_loader: Populate library namespace without incurring immediate import costs

I think it would also be interesting if we had a way to automate the process of setting up the infrastructure - say we had some tool that compiled the tree of imports into a YAML file which was then read in by the lazy loader. Probably wouldn’t be very useful for projects who’ve already went through the effort of implementing it but could be used as a quick setup of sorts for new projects.

I tried the lazy import implementation in my own project, and it works nicely. I already supported accessing submodules before without importing it, but it had a slow startup time because of a bug regarding unnecessary Numba compilation. With lazy loading that is only done when is needed (not common), so the startup time went from 4-5 seconds to instantaneous, which is great for REPL.

My only gripe is that, in order to use type hint annotations for Mypy, you have to repeat the imports on a TYPE_CHECKING branch, as I reported in Type hints/Mypy best practices? · Issue #28 · scientific-python/lazy_loader · GitHub.

I think that we should discuss the best approach to combine typing and lazy imports. Personally, I don’t see how we can avoid repeating things unless we have additions to the Python language (either first-class support for lazy imports, or some kind of macros/static-evaluable-functions for static analyzers), but maybe I am overlooking something. Another possibility is to create a Mypy plugin, but those cannot be used by other type checkers as far as I know.

1 Like

Hi @vnmabus and welcome to the forum!

We just updated the SPEC to show how stub files can be used to avoid duplication with lazy loading. What do you think, would this work for your use-case?

PR — (rendered here)

1 Like

Hello! Astropy is interested in this SPEC. FYI.

1 Like

I just wanted to revive this topic to mention that I tried an approach with better syntax that removes both duplication and the need of stubs, but introduces a bit of implementation magic (and it can probably be improved by someone with more knowledge about Python imports that me).

A draft PR is in: Add context manager functionality by vnmabus · Pull Request #70 · scientific-python/lazy_loader · GitHub for those who are not watching that repo. I would appreciate any feedback, to see if there is a way that can be merged or if it should be discarded or forked. Thank you!