A pragmatic pathway towards skimage2

stefanv · September 27, 2022, 8:59pm

Hi everyone,

As you may be aware, the team has been working on a way towards addressing some long-standing issues in the existing skimage API. The proposal, written up by @jni, is known as SKIP4.

Today, Matthew Brett and I had a long discussion on how to enact SKIP4 without losing a major part of our user community—something we cannot afford. The Python 2 to 3 transition is still fresh in our minds, along with the pain it caused (and some codes have still not been ported to Python 3).

So, it may be helpful to write down some principles of a transition. I think these should include:

Do not want to estrange a large part of our existing user base. This implies making any porting of code straightforward (a challenge with the Py 2 to 3 port was that you had to read and think about the code you were porting; it was not just a matter of applying certain prescribed recipes).
Ensure that most users eventually land on skimage2. If, e.g., we provided a supported skimage1 and skimage2, there would be no incentive to transition.
Do not place an undue burden on maintainers. This would, e.g., preclude long term support of both skimage1 and skimage2.
Initially, allow users to use the old and the new APIs in conjunction (this should help greatly in porting code).

It is not trivial to satisfy these three principles simultaneously. After considering many options, the following potential route crystallized:

We release skimage2 (distributed as pip install scikit-image). This becomes the only supported release, but see (2).
In skimage2, maintain a version of skimage that utilizes skimage2 as its engine, but provides users with the old API experience.
Over time, deprecate the functions in the skimage1 interface: first with warnings, later with hard deprecations. Importantly, we provide clear instructions with examples on how to move from the old to the new code (we know what that looks like already, since we maintain the skimage1 “back-port”). Porting instructions also go into the migration guide.
Eventually (and there’s no rush on this), remove the skimage namespace entirely, raising an error explaining to the developer what to use instead.

This repackages some of the ideas in SKIP3—so that’s worthwhile background reading.

Let me know your thoughts!

martinberoiz · September 28, 2022, 10:59pm

Thanks for the effort of making it a smooth transition.

I have a few questions.

What version do I get if I do pip install scikit-image and then do import skimage.

Can I do pip install scikit-image and then import skimage2?

Can I do pip install skimage2?

Sorry if this has been said before, I’m a bit confused by these skimage1 and skimage2.

stefanv · September 28, 2022, 11:11pm

@martinberoiz!

Very good questions. In the current SKIP4, mentioned above, I believe we intended to make two packages: scikit-image and skimage2. However, in the proposal above, I suggest we keep the package name scikit-image, but that it will contain both skimage and skimage2 namespaces.

This means that everyone using the scikit-image package right now will be upgraded to the new package including the skimage2 namespace, whether they use it or not.

Docs will be updated to use skimage2, and skimage namespace functions will gradually become harder to use as they complain more loudly about being deprecated.

jni · September 29, 2022, 12:12am

I like this approach! For the record, the motivation for the two separate packages was:

eventually, depending on scikit-image without a <=1.0 will cause a library to break. With the separate packages for scikit-image and skimage2, you can be assured that scikit-image will continue to work.
I was really looking forward to having our package name and import name match!

Having said this, the maintenance burden is much lower if we have the two namespaces in the one package. So I think it’s a good plan, thanks @stefanv for sharing! (I am filled with regret about not having attended the NumFOCUS summit. )

stefanv · September 29, 2022, 12:19am

Thanks for the feedback @jni! Of course, nothing precludes us from registering skimage2, like we have skimage

Or, perhaps skimage should become the de-facto package, which depends on scikit-image. I haven’t thought this aspect through.

lagru · September 29, 2022, 9:12am

Great suggestion @stefanv! So comparing this suggestion to the points raised in SKIP4:

Although semantic versioning [6] technically allows API changes with major version bumps, we must acknowledge that (1) an enormous number of projects depend on scikit-image and would thus be affected by backwards incompatible changes, and (2) it is not yet common practice in the scientific Python community to put upper version bounds on dependencies, so it is very unlikely that anyone used scikit-image<1.* or scikit-image<2.* in their dependency list. This implies that releasing a version 2.0 of scikit-image with breaking API changes would disrupt a large number of users.

This disruption would happen in all proposed cases were we introduce a new API. However, if we can stretch out this disruption gradually I’m all for it.

Additionally, such wide-sweeping changes would invalidate a large number of StackOverflow and other user guides.

Again, this will happen in all proposed cases were we introduce a new API.

Finally, releasing a new version with a large number of changes prevents users from gradually migrating to the new API: an old code base must be migrated wholesale because it is impossible to depend on both versions of the API. This would represent an enormous barrier of entry for many users.

This would be addressed by the new proposal.

I seem to remember a 5th principle “somebody’s law” which is to prevent subtle changes that go unnoticed but invalidate scientific results. I can’t find a reference to it anymore in SKIP4 but I remembr it as a big concern? In any case, this is also prevented by never introducing these kind of changes to the original skimage namespace but keeping them in skimage2.

All in all I’m cautiously in favor of this “package with two modules” approach.

jni · September 29, 2022, 11:39am

I forgot about this concern, and it does give me pause…

stefanv · September 29, 2022, 4:11pm

@lagru That was the Hinsen principle: the same code should not yield different results after a library upgrade.

Re: broken S/O examples, we can choose for how long we maintain the old version of skimage inside of skimage2. Also remember that breakage will always describe an upgrade path, which means S/O readers will be guided to port solutions to skimage2 (and perhaps even update S/O in response!).

I think the point here is that we’ll have to keep the fully deprecated skimage1 around for quite a while, perhaps forever.

lagru · September 30, 2022, 2:42pm

My view is that this will happen anyway over time. We are constantly changing our API and @stefanv has some good points that this proposal deals with this in a better way than our normal deprecation cycles which are invisible after a few releases.

I hope that “forever” won’t be the case. I don’t think the maintenance cost (build chain, keeping up with dependencies, …) will be worth it long-term. Especially because earlier versions will still be available to some degree.

One more thing. Once we’ve moved code to skimage2 and updated it, it might be impossible to emulate the old behavior or results. This something we are willing to accept, right? We would still remove the old code eventually? On my part, I’d say yes to those questions.

stefanv · September 30, 2022, 4:36pm

After a while, the skimage namespace will be nothing more than a shell. There will be no image processing code inside, only the porting instructions remain. So, carrying that around won’t be much of a maintenance burden—it’s essentially just documentation.

It is true, this may happen in some cases, in which case we will need to duplicate some code back into the old namespace. I hope that those cases are few, but there is no technical barrier to making this work.

rfezzani · October 11, 2022, 10:22pm

Thank you @stefanv, I like this proposition.

Concerning this famous Hinsen principle, as it was already mentioned, we already broke it at many occasions when fixing bugs (see https://github.com/scikit-image/scikit-image/issues/6510 for example or https://github.com/scikit-image/scikit-image/issues/6456).

A partial solution to this problem is to support pip installing any version of skimage using git branches like proposed here.

We may also publish the timeline with the different scikit-image release dates and use it as reference to specify which version of the package is concerned by the S/O question according to the asking date… This is also applicable to youtube tutorials…

stefanv · October 11, 2022, 11:10pm

Unfortunately, most users won’t be able to install without wheels, and we only build those for releases. But, you can certainly specify versions as in pip install scikit-image==0.14.

This would be good to have as part of the migration docs.

lagru · February 26, 2023, 2:28pm

I’ve been recently wondering / worrying about this (though I still think it’s the best suggestion yet).

What will we do in cases, where skimage2 will return different values because the updated API also changes an underlying assumption (e.g. in the case of scaling input)? Are we prepared to add and maintain a potentially (performance) expensive compatibility layer? In some cases it might be a lot easier to keep the old implementation around and document the differences.

We can keep the old tests around helping with not breaking existing behavior. Nevertheless, wrapping skimage2, dependency updates and the transition will introduce bugs in skimage1. How long are we prepared to carry and maintain this additional maintenance burden so that it doesn’t become “long term support”.

The API rewrite, its documentation and the compatibility layer will take some time. I would still like to provide the user with bug fixes and new features in the meantime. I would also like to avoid the mess of maintaining and porting between two different git branches. One approach could be:

Once we add the skimage2 namespace and move implementations over, we include it in our distribution archives. Importing it will warn users that there is no feature parity yet and that the skimage2 API is unstable. A more extreme version would be to add the namespace as private _skimage2.
Once we consider skimage2 ready, we remove the warning, announce that skimage1 will not gain any new features and start deprecating skimage1.

jni · February 27, 2023, 12:54am

I don’t think we should reimplement skimage1 with skimage2 as the engine. I think we should simply keep skimage1 around (different package) and potentially update with critical bug fixes and maintenance (e.g. support new Python versions), but nothing else.

I recently read a post that helped to crystalise my ideas around this topic:

I don’t think rewriting skimage1 in terms of 2 is a valuable use of our time. Keeping it around, though, would be, because it would help so many users, and it doesn’t stop us from realising our dreams of skimage2 and even skimage3.

stefanv · February 27, 2023, 2:18am

Initially the two codebases will be the same, so I don’t see the benefit in duplicating until there’s divergence. At least, at that point, why not simply reserve “copying” as a valid strategy, if it’s difficult to accommodate for differences introduced?

Anyway, if the preference of the core team is to ALWAYS copy code, that’s fine too. I’m not going to take a principle stance, despite my personal preference to not duplicate code until necessary.

lagru · February 27, 2023, 2:13pm

To clarify, you mean to keep skimage1 around in its own package on PyPI (e.g. scikit-image-old?) once we’ve officially deprecated the skimage namespace fully in our scikit-image package?

stefanv · February 28, 2023, 8:24am

I don’t feel excited about doubling the wheel builds to re-release with each Python version. So, unless @jni proposes that we don’t provide skimage1 for newer Pythons (which would make life quite hard for existing users), I don’t think that’s an option?

jni · February 28, 2023, 9:15am

pip install scikit-image will always give scikit-image 1.x. pip install skimage2 will be the new package. (We might want to also maintain aliases for scikit-image2 and scikit-image-2?). The repo can contain both packages.

Regarding the API, I think it would be easier to go in the forward direction: use the scikit-image1 API and implement skimage2 in terms of it. In fact, we could potentially make skimage2 depend on scikit-image1, at least initially.

lagru · February 28, 2023, 4:13pm

Hu? I’m starting to get confused. Isn’t that the proposed strategy from SKIP 4? The first post in this thread suggests to modify this strategy:

From your previous comments, @jni, I assumed that you were onboard with this new strategy?

I found the article you posted really interesting, though, I think the lessons learned should be taken with a grain of salt. All examples in the article that maintain previous versions are backed by corporate resources and money and all are GUI-focused products.

jni · March 2, 2023, 5:59am

You’re right, I diffused back to my original position and didn’t reread the whole thread Sorry for the noise!

Anyway, I dunno. Reflecting more on it, I don’t see a good reason for pip install scikit-image to give the new package. If people want to try out skimage2, they can just pip install skimage2?

And either way I think it’s easier to implement skimage2 by depending on skimage(1) than vice versa. For example, the value scaling is trivial to implement, just pass preserve_range=False everywhere. (And this is no longer a kwarg in version 2, you have to do your own scaling.)