SPEC 1 — Lazy Loading for Submodules

jarrodmillman · February 5, 2021, 1:22am

Recommends lazy loading functionality for an easily-accessible namespace, but without compromising performance.

rkern · May 21, 2021, 1:53am

I think this is missing an important piece of the history with respect to scipy. Some of the earliest releases may have imported all of the sub-modules in scipy/__init__.py, but we very quickly moved to a lazy import mechanism (PackageLoader). We eventually dropped the lazy import mechanism because it failed too frequently in very confusing ways, especially at interactive prompts. So the history was “Greedy Imports → Lazy Imports → No Imports”.

I’m sure the modern tooling is better, as the Python import system has incorporated new functionality that we can rely on, but that should be acknowledged as something that might let us go back to something that we once abandoned.

shoyer · May 27, 2021, 7:15am

Tensorflow may also be a useful precedent here, e.g., see:

github.com

tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/util/lazy_loader.py

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""A LazyLoader class."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

This file has been truncated. show original

https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/api_template.init.py

In the past, my experience with the lazy module loader in TensorFlow was not great (it placed restrictions on how you could do imports) but that seems to have been fixed now.

Still, there are lots of potential subtle incompatibility issues. For example, the editor in Google’s own Colab IPython notebook environment doesn’t recognize some TensorFlow sub-modules as valid:

stefanv · May 27, 2021, 8:39pm

Thank you, Robert. I wasn’t aware (or forgot about?) this part of the SciPy history. The implementation I have now is very simple, and does not make use of any magic other than overriding __getattr__ on the module (which is the relevant development on the Python side). My hope is that this would mean a low likelihood of breakage.

After a suggestion by Jon Crall, I will also add an environment variable to enable greedy importing, for debugging purposes.

stefanv · May 27, 2021, 8:44pm

Thanks for making me aware of TensorFlow’s approach, Stephan.

I can understand how installing lazy modules can cause all sorts of issues with editors. We started that way, but the latest implementation is a very simple override only of __dir__ and __getattr__. This means that editors can introspect the way they normally would, and that they would encounter the same objects they would with non-lazy imports (i.e., no proxy objects).

I have tested the skimage PR with IPython, but not yet with any of the other editors. If you have one of those set up and could give it a try, that would be great.

stefanv · May 27, 2021, 9:27pm

Also, would you like me to mention this in the SPEC, test the mechanism in some specific way, or are you expressing concern about the approach overall?

rkern · May 28, 2021, 12:06am

I commented primarily to clear up the history being told in the SPEC.

I’m not especially looking forward to reading code invoking scipy.linalg.whatever() all over the place (though I am looking forward to interactively tab-completing my way to the skimage functions I want but never remember if they are in skimage.filters or skimage.morphology).

stefanv · June 1, 2021, 10:06pm

@rkern I have started a PR to correct the history. I’ve used your words directly—is that OK? Maybe you would consider being co-author on the SPEC and letting me know of any other context I am missing.

W.r.t. the scipy.linalg.whatever, hopefully people will use sp.linalg.whatever at least. But, also, we could write a recommendation that submodules still be imported explicitly, unless there is a conflict of sorts (numpy.linalg, networkx.linalg, etc.).

stefanv · October 29, 2021, 10:12pm

Lazy loading has been merged into scikit-image, so now we have a living experiment. It is also integrated into Napari.

grlee77 · February 18, 2022, 4:39pm

I updated the RAPIDS cuCIM library to also use lazy loading in v22.02.00

stefanv · February 18, 2022, 9:44pm

Thanks for sharing, Greg! I don’t see it mentioned in the release notes from 16 days ago—is that the version I should be looking at?

grlee77 · February 19, 2022, 3:30am

Yes, it only shows up there indirectly via “Update cucim.skimage API to match scikit-image 0.19”. The lazy loading is at the top level, though, and not just in the cucim.skimage package.

brettcannon · March 23, 2022, 6:22pm

FYI the Cinder project from Meta is looking to work on upstreaming their lazy import mechanism.

stefanv · March 23, 2022, 6:40pm

Thanks for the heads-up @brettcannon!

Our implementation has now also been refactored into its own package, and lives at GitHub - scientific-python/lazy_loader: Populate library namespace without incurring immediate import costs

kne42 · June 17, 2022, 11:02pm

I think it would also be interesting if we had a way to automate the process of setting up the infrastructure - say we had some tool that compiled the tree of imports into a YAML file which was then read in by the lazy loader. Probably wouldn’t be very useful for projects who’ve already went through the effort of implementing it but could be used as a quick setup of sorts for new projects.

vnmabus · September 7, 2022, 6:41am

I tried the lazy import implementation in my own project, and it works nicely. I already supported accessing submodules before without importing it, but it had a slow startup time because of a bug regarding unnecessary Numba compilation. With lazy loading that is only done when is needed (not common), so the startup time went from 4-5 seconds to instantaneous, which is great for REPL.

My only gripe is that, in order to use type hint annotations for Mypy, you have to repeat the imports on a TYPE_CHECKING branch, as I reported in Type hints/Mypy best practices? · Issue #28 · scientific-python/lazy_loader · GitHub.

I think that we should discuss the best approach to combine typing and lazy imports. Personally, I don’t see how we can avoid repeating things unless we have additions to the Python language (either first-class support for lazy imports, or some kind of macros/static-evaluable-functions for static analyzers), but maybe I am overlooking something. Another possibility is to create a Mypy plugin, but those cannot be used by other type checkers as far as I know.

stefanv · September 14, 2022, 10:16pm

Hi @vnmabus and welcome to the forum!

We just updated the SPEC to show how stub files can be used to avoid duplication with lazy loading. What do you think, would this work for your use-case?

PR — (rendered here)

pllim · October 10, 2022, 8:14pm

Hello! Astropy is interested in this SPEC. FYI.

github.com/astropy/astropy

Adopt Lazy Loading

opened 07:59PM - 10 Oct 22 UTC

nstarman

Feature Request Upstream Fix Required Performance API change

### Description Adopt https://scientific-python.org/specs/spec-0001/ if the d…raft is accepted. This will hopefully speed up importing Astropy. ### Additional context This spec is still a draft, but is used by [scikit-image](https://github.com/scikit-image/scikit-image/pull/5101) and [NetworkX](https://github.com/networkx/networkx/pull/4909) and kind-of SciPy.

vnmabus · November 28, 2023, 10:46am

I just wanted to revive this topic to mention that I tried an approach with better syntax that removes both duplication and the need of stubs, but introduces a bit of implementation magic (and it can probably be improved by someone with more knowledge about Python imports that me).

A draft PR is in: Add context manager functionality by vnmabus · Pull Request #70 · scientific-python/lazy_loader · GitHub for those who are not watching that repo. I would appreciate any feedback, to see if there is a way that can be merged or if it should be discarded or forked. Thank you!