Interest in GitHub Action for scipy-wheels-nightly uploads and removals

This topic is to try and gauge interest from the maintainers of the projects that currently upload their nightly wheels to the Anaconda Cloud scipy-wheels-nightly org that @ogrisel maintains in using a GitHub Action (that would be built) to simplify the upload process and to also make it easier to automatically remove old uploads from the org to ensure that the storage limits are not exceeded. For background context please see:

Motivation

At the moment the projects in the scipy-wheels-nightly org are:

  • statsmodels
  • dipy
  • scikit-learn
  • matplotlib
  • pandas
  • scipy
  • numpy
  • h5py
  • scikit-image

As @tacaswell has shown not all of the projects end up using a similar amount of the storage space

storage-space

so to avoid situations where nightly wheel uploads fail as there is no storage allocation left, it would be helpful to ensure that old uploads are automatically deleted. matplotlib already does this as part of their upload GitHub Actions workflow where only the last 5 uploads are kept and the rest are deleted. It would be helpful if all the projects did this (scikit-learn is a bit special as Andreas MĂĽller pointed out to me at SciPy 2022 as they only keep one dev0 wheel and just keep overwriting it with each upload).

Idea

I wrote the nightly wheels upload/removal GitHub Actions workflow for matplotlib and the logic for it is pretty short and simple. I think it would be easy to turn this into a GitHub Action that could be parameterized so that instead of each project having to implement their own workflow or copy matplotlib’s they could just use a centralized action that is maintained. This also avoids everyone having to keep track of the status of the anaconda-client PyPI releases and what version they should be pinning to on GitHub until that Issue is resolved.

This won’t work for everyone though in all cases, as @mattip has already pointed out that

Some of the [NumPy and SciPy] uploads are done from travis (for aarch64 and ppc64) so a github action will not work there.

For other projects is the centralization of nightly upload procedures interesting? Or would you prefer to maintain your own workflows? It would be great to hear from all the projects so I’ll take the liberty of tagging people from each (but I assume NumPy and SciPy are out unless @mattip and @rgommers think there’s any benefit in using a GHA for part of it, but I think splitting the process isn’t super attractive in general):

(Please tag people I’ve missed — mea culpa. I’m also not quite sure who all are maintainers on SciPy and NumPy these days so I’m sure that I’m missing others as well.)

(edit: Added SciPy and NumPy back to the list following Ralf’s comment. )

2 Likes

Thanks for working on this @matthewfeickert.

SciPy is in the middle of moving to cibuildwheel, and NumPy already uses that. It seems fine to include there. We’ll just leave TravisCI alone, that’s just a couple of aarch64 wheels per run, so that doesn’t increase space usage by a lot.

2 Likes

Sounds like a great thing to do! 5 seems like a good compromise. Although, do you have the visibility to tell if 5 is enough or too much? If 1 is fine for Scikit-Learn, maybe it’s fine for everyone?

2 Likes

The choice of 5 comes from a suggestion that @ogrisel made

Indeed I think it would be great to have shared script to automatically clean-up old nightly files and only keep the 5 most recent dev wheels for a given project and platform spec for instance. Assuming one dev build per day, that’s approximately one week of history which might be helpful to avoid deleting wheels that might still be used by automated systems with a bit of lag between successive steps.

Note that scikit-learn does not cause too much space usage because with use the fixed .dev0 suffix. But we might want to use a more precise number in the future.

and then matplotlib adopted that choice

Seems like a good number, I could see a case for going up to like 14 (2 weeks), but given that other projects are replacing their wheels nightly 5 seems pretty good!

I don’t think anyone has expressed strong feelings on it needing to be 5 though, so I think if we could get all the projects to try using 5 then we could see how much storage space there is leftover.

2 Likes

@ogrisel did you have any thoughts on what GitHub organization this GitHub Action should be hosted under? I realistically won’t have time to build this for a few weeks (maybe not until September) (so no rush) and while I’m happy to build it at first under my own GitHub account it should probably live somewhere where people beyond me have control over it.

We would be happy to host it under Scientific Python’s org :smiley:

4 Likes

That seems like a great spot! Unless anyone has strong objections let’s move forward with that. :rocket:

1 Like

If you decide to host it under the scientific-python org, I am happy to set you up with a team and whatever permissions you need. We can also help (some) with the development. Although, I’ve never made a GH action before. :wink:

2 Likes

In consultation with @matthewfeickert , I created a nightly-wheels repo and a Nightly Wheels Developers team that “owns” the repo. I assigned @matthewfeickert the Maintainer role for the Nightly Wheels Developers team, so either of us should be able to invite/add team members among other things.

I am not sure when development will start given everyone’s time constraints. Feel free to watch here or the repo for more updates.

2 Likes

Thanks Jarrod! I was actually able to get an action mostly up and working on my personal GitHub in under an hour, and am in the process of porting it (c.f. https://github.com/scientific-python/nightly-wheels/issues/1) so I think we can probably get something ready for use in a few weeks (will mostly just need feedback from the possible users on what a viable API might be and then can cut a v0 release).

It would be good if we could setup a scientific-python Anaconda org in the same manner as the scipy-wheels-nightly org so that we can have dedicated targets for testing. As I mention in the Issue, I’m happy to set this up, but it probably makes more sense for a scientific-python admin to be the owner of such an org.

1 Like

I created scientific-python-nightly-wheels :: Anaconda.org

What do I need to do next?

1 Like

Good question and I’m not fully sure. If you have the ability to add me as an admin to the org then I assume that I’ll have permissions to create test projects. My plan was to initially have a test project to be able to store test wheels for use in the workflow (so as to not be directly reliant on other projects) and then another one that would be used to test the upload and removal components.

I wasn’t sure how to setup admins, but I was able to add you as an owner.

1 Like

Shall we start drafting a SPEC to document the shared governance, tools and recommendations to upload nightly builds to the new https://anaconda.org/scientific-python-nightly-wheels/ org?

1 Like

I started a stub for this SPEC:

It needs a lot of work, but I wanted to first just get all the links set up. It will be marked as a draft, so the fact that it is just a stub is OK at this point. The next steps include:

  1. Implementing the GH action (GitHub - scientific-python/nightly-wheels: GitHub Action for uploads and removals of nightly wheels)
  2. Setting up the team and contributor guide for the GH action.
  3. Starting creating nightly wheels for several widely-used projects.
  4. Investigating whether we can do something similar for PyPI.
  5. Creating a GH action or example script to help downstream projects test against these nightly wheels via a cron job that creates an issue when the tests fail against the nightly wheels.
2 Likes

Quick notes

  1. Implementing the GH action (GitHub - scientific-python/nightly-wheels: GitHub Action for uploads and removals of nightly wheels)

Is mostly done (I’m able to replicate everything I did for the matplotlib nightly upload), but I need to do some cleanup and ask for people’s input on API for the action. This will probably need to wait till next week as I’m at a physics workshop this week.

  1. Starting creating nightly wheels for several widely-used projects.

Switching matplotlib over should be easy. We’ll probably need to better understand how all the other projects are uploading their wheels to help them switch over to using the GHA.

  1. Investigating whether we can do something similar for PyPI.

In terms of uploading? If so, then GitHub - pypa/gh-action-pypi-publish: The blessed GitHub Action, for publishing your distribution files to PyPI: https://github.com/marketplace/actions/pypi-publish already covers this.

  1. Creating a GH action or example script to help downstream projects test against these nightly wheels via a cron job that creates an issue when the tests fail against the nightly wheels.

In most situations people can get away with installing their dependencies like normal and then just pip installing from the index the dependency they want to test. For example, if you want to test the nightly scipy but not the nightly numpy

$ python -m pip install scipy
$ python -m pip install --upgrade --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple scipy

but if you wanted to test both then you could do something like what is shown in the matplotlib docs (the --extra-index-url in there would be for a situation in which there are dependencies that are not on the scipy-wheels-nightly index)

python -m pip install \
  --upgrade \
  --pre \
  --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple \
  --extra-index-url https://pypi.org/simple \
  scipy

So we can create some examples for people, but I don’t think a script is really needed as it is just a pip install. Examples I’ve implemented so far:

Opening an Issue on failure might be problematic as it could add noise to situations where problems are understood but just haven’t been resolved yet.

2 Likes

Hello,

I thought you might be interested in this GitHub actions workflow that Astropy & SunPy collaborated on: GitHub - OpenAstronomy/github-actions-workflows: Reusable workflows for GitHub Actions

Astropy and SunPy use it to publish final dists to PyPI and Astropy use it to upload nightlies to anaconda.org.

SunPy also already uses the mpl nightly wheel in our testing, so I am here for more nightlies!

Thanks!

3 Likes

Please see GitHub - scientific-python/upload-nightly-action: This action is used to upload nightly builds of your package. for recent further development.