Pinning test dependencies with regular updates

Over at Pin test dependencies by Czaki · Pull Request #5715 · napari/napari · GitHub, Grzegorz has updated our build infrastructure to:

  • place hard pins on our test dependencies (both direct and indirect!)
  • create an action to periodically (weekly) check whether any dependencies have been updated, and if so, create a PR to update the dependency pins

This solves the problem of “the CI is failing, let me hunt down which package created an update recently and check whether that update is the culprit”. Instead, the weekly PR will explicitly show which packages have been updated.

How do we feel about copying that approach for scikit-image? @stefanv perhaps even a SPEC might be good here?

Testing with latest dependencies is useful, but we already do that. This feels like a lot of machinery just for knowing exactly where the breakage occurred?

On other projects, we’ve used dependabot, but there the pins were always fixed so the situation was a lot simpler.

An alternative approach that could provide the same information as above is to cache the versions of dependencies and, upon failure, to compare with what was used in the test run. That should be a <10 line change to our existing workflow.

I routinely add a pip freeze into the CI jobs, so it’s extremely easy to compare versions. And then the regular recipe: have one job that has all the oldest versions of dependencies to be supported, and most importantly have one job that has the dev version of everything (when nightly wheels are available), and prereleases when the wheels are not available.

This usually smokes things out. If you expect issues with infrastructure packages (e.g. new releases of pytest or sphinx), then I would consider using their dev versions in the devtest job, too.

pip list for humans, pip freeze for machines.

Thinking out loud: can we write a script that grabs the test results of commits on master and shows where things broke and which package changes happened? All this should be quite a bit easier than complicated pinning.

At that point, using tools that help to reproduce locally is super valuable, too. E.g. using tox both in CI and locally, and then something like pip-timemachine or tox-timemachine could roll back the versions to the point of last passing, without the need of downloading and parsing CI logs.

That should be a <10 line change to our existing workflow.

Can I see it? :joy:

The important things here are:

  1. CI doesn’t break randomly on updates, it only breaks in the job updating the pins.
  2. At that point, only that job breaks, and we can take our time investigating.
  3. This means we don’t have to go around telling contributors “sorry your CI is broken, it’s not you, it’s all the builds”, followed by a scramble to find why the builds broke, followed by an issue, followed by telling all active contributors “issue is X, ignore it”, while also manually checking all the jobs because the pass/fail signal is no longer granular enough.

To me, all of that is pretty valuable, more so than just finding out quickly what packages updated recently and broke the build — which is also valuable!

1 Like

Case in point: Comment in “Check if spacing parameter is tuple in regionprops” #6907 :roll_eyes: