Deprecate default values for spmatrix keyword in KDtree and io functions

I’d like to propose that we deprecate the default spmatrix=True for KDTree and io functions.

The idea is that these keyword options have been around for 2+ releases.
It is time to shift the default values from spmatrix to sparray
(from spmatrix=True to spmatrix=False).

For scipy.spatial: PR 24642
_KDTree.sparse_distance_matrix and.cKDTree.sparse_distance_matrix.
These do not have dok_array and coo_array options yet. So new code
supports the sparray forms and deprecation is added of the keyword value.

For scipy.io: PR 24644
The reader methods in io named mmread, hb_read, loadmat,
mmread (FFM format) have spmatrix=True and already support False.
No new code needed. Just deprecate current behavior.

The warnings provide a small reminder/incentive for libraries to
shift toward the sparray interface. This will help avoid side-cases
during the deprecation of spmatrix.

Any thoughts?
Best,
Dan

Thanks for continuing to move this along Dan!

First impression, it seems like a very reasonable strategy to continue introducing small DEP-shaped nudges to move in every release. The scipy.io PR seems fine. I’ll note that our regular deprecation policy is removal in 2 rather than 1 releases after introduction, so both PRs should say 1.20.0 rather than 1.19.0.

For the scipy.sparse PR, it doesn’t look right to deprecate the current behavior at the same moment as introducing the replacement - this makes it impossible for downstream packages to react to the deprecation without introducing version-dependent code. E.g., this code:

>>> import numpy as np
>>> from scipy import sparse, spatial
>>> x = spatial.KDTree(np.random.randn(50, 2), leafsize=2)
>>> y = spatial.KDTree(np.random.randn(50, 2), leafsize=2)
>>> res = x.sparse_distance_matrix(y, 0.5)
>>> type(res)
<class 'scipy.sparse._dok.dok_matrix'>

cannot be changed to use output_type=‘dok_array’ easily.

Taking a step back: it would be great to get a tracking issue for all the deprecations that are in the pipeline. A quick turned up DEP: sparse.spmatrix: deprecate `todense` · Issue #14494 · scipy/scipy · GitHub which links to a HackMD with a plan, but I don’t think we have a general “deprecate all spmatrix output” tracking issue yet?

You are correct that the deprecation period should be 2 releases, so the 1.19 should be made 1.20 in these warnings. Thanks!

And you are correct that we do not have a github issue to bring together the sparse array ideas and plans and discussions. I’ll try to put that together. It will need to be a work in progress though. :}

As for deprecating quickly:
Historically, many deprecations come out when a new feature is announced. In my experience typically, the new feature appears in a release and the old interface is deprecated at that time. The old interface is not removed. Only deprecated. The deprecation period is all about allowing people time to adjust their code. It seems disingenuous to know something is going away, and NOT tell people. When we don’t say it, then new code gets written with the old interface. And that makes more work for the user down the line.

But I understand what you mean – at least for the libraries using this feature and trying to adjust their code so it can work with both versions. Ideally we make it so that code can be easily built that works for both old and new.

That specific KDTree code you point to is actually pretty easy to get working across versions. Just wrap the call in dok_array. That will work with new and old versions. Perhaps I should say that somewhere. But my point is that not telling people a removal is coming makes the user experience worse because they write code for the old interface after the new interface is available. Then they have to change that code later.

I’d prefer to have the warning as soon as possible and lengthen the deprecation period if the change is happening too fast for folks.

I think what you are missing is that a lot of libraries run with “turn deprecation warnings into errors” in CI (something like filterwarnings = error in pytest.ini). This keeps the test suite warning-free, and ensures that the next release is forward-compatible with new releases of the upstream project. Hence, introducing a warning forces downstream libraries to migrate straight away, or silence the warning in some other way.

What I said is a very general rule, and only should be broken with very good reasons. There doesn’t seem urgency here.

I’ll go along with whatever – but I hope we all realize that there is a cost to doing it this way. Yes, we need to look out for library maintainers (nightly wheels should do this). But we also need to help users (and libraries) avoid writing new code with an old API.

What would you think about isolating the docs for spmatrix – those pages would still be there but accessible only after a click or two from the main sparse page? Then newbies would get the new interface, people could still look up how to use the old interface if needed, and it will be clear that new code should use the new API. It’s like deprecation without the warnings – followed by warnings in a later release and then removal after that.

Yes, docs-only guidance is perfectly fine as a first step, and it looks like a good idea here. I like your idea of a larger restructure of the scipy.sparse documentation, that could be quite helpful.

1 Like