This topic is to try and gauge interest from the maintainers of the projects that currently upload their nightly wheels to the Anaconda Cloud scipy-wheels-nightly org that @ogrisel maintains in using a GitHub Action (that would be built) to simplify the upload process and to also make it easier to automatically remove old uploads from the org to ensure that the storage limits are not exceeded. For background context please see:
At the moment the projects in the scipy-wheels-nightly org are:
statsmodels
dipy
scikit-learn
matplotlib
pandas
scipy
numpy
h5py
scikit-image
As @tacaswellhas shown not all of the projects end up using a similar amount of the storage space
so to avoid situations where nightly wheel uploads fail as there is no storage allocation left, it would be helpful to ensure that old uploads are automatically deleted. matplotlib already does this as part of their upload GitHub Actions workflow where only the last 5 uploads are kept and the rest are deleted. It would be helpful if all the projects did this (scikit-learn is a bit special as Andreas MĂĽller pointed out to me at SciPy 2022 as they only keep one dev0 wheel and just keep overwriting it with each upload).
Some of the [NumPy and SciPy] uploads are done from travis (for aarch64 and ppc64) so a github action will not work there.
For other projects is the centralization of nightly upload procedures interesting? Or would you prefer to maintain your own workflows? It would be great to hear from all the projects so I’ll take the liberty of tagging people from each (but I assume NumPy and SciPy are out unless @mattip and @rgommers think there’s any benefit in using a GHA for part of it, but I think splitting the process isn’t super attractive in general):
statsmodels (not sure if any of the maintainers are on this Discourse?)
dipy (not sure if any of the maintainers are on this Discourse?)
(Please tag people I’ve missed — mea culpa. I’m also not quite sure who all are maintainers on SciPy and NumPy these days so I’m sure that I’m missing others as well.)
(edit: Added SciPy and NumPy back to the list following Ralf’s comment. )
SciPy is in the middle of moving to cibuildwheel, and NumPy already uses that. It seems fine to include there. We’ll just leave TravisCI alone, that’s just a couple of aarch64 wheels per run, so that doesn’t increase space usage by a lot.
Sounds like a great thing to do! 5 seems like a good compromise. Although, do you have the visibility to tell if 5 is enough or too much? If 1 is fine for Scikit-Learn, maybe it’s fine for everyone?
Indeed I think it would be great to have shared script to automatically clean-up old nightly files and only keep the 5 most recent dev wheels for a given project and platform spec for instance. Assuming one dev build per day, that’s approximately one week of history which might be helpful to avoid deleting wheels that might still be used by automated systems with a bit of lag between successive steps.
Note that scikit-learn does not cause too much space usage because with use the fixed .dev0 suffix. But we might want to use a more precise number in the future.
Seems like a good number, I could see a case for going up to like 14 (2 weeks), but given that other projects are replacing their wheels nightly 5 seems pretty good!
I don’t think anyone has expressed strong feelings on it needing to be 5 though, so I think if we could get all the projects to try using 5 then we could see how much storage space there is leftover.
@ogrisel did you have any thoughts on what GitHub organization this GitHub Action should be hosted under? I realistically won’t have time to build this for a few weeks (maybe not until September) (so no rush) and while I’m happy to build it at first under my own GitHub account it should probably live somewhere where people beyond me have control over it.
If you decide to host it under the scientific-python org, I am happy to set you up with a team and whatever permissions you need. We can also help (some) with the development. Although, I’ve never made a GH action before.
Thanks Jarrod! I was actually able to get an action mostly up and working on my personal GitHub in under an hour, and am in the process of porting it (c.f. https://github.com/scientific-python/nightly-wheels/issues/1) so I think we can probably get something ready for use in a few weeks (will mostly just need feedback from the possible users on what a viable API might be and then can cut a v0 release).
It would be good if we could setup a scientific-python Anaconda org in the same manner as the scipy-wheels-nightly org so that we can have dedicated targets for testing. As I mention in the Issue, I’m happy to set this up, but it probably makes more sense for a scientific-python admin to be the owner of such an org.
Good question and I’m not fully sure. If you have the ability to add me as an admin to the org then I assume that I’ll have permissions to create test projects. My plan was to initially have a test project to be able to store test wheels for use in the workflow (so as to not be directly reliant on other projects) and then another one that would be used to test the upload and removal components.
It needs a lot of work, but I wanted to first just get all the links set up. It will be marked as a draft, so the fact that it is just a stub is OK at this point. The next steps include:
Setting up the team and contributor guide for the GH action.
Starting creating nightly wheels for several widely-used projects.
Investigating whether we can do something similar for PyPI.
Creating a GH action or example script to help downstream projects test against these nightly wheels via a cron job that creates an issue when the tests fail against the nightly wheels.
Is mostly done (I’m able to replicate everything I did for the matplotlib nightly upload), but I need to do some cleanup and ask for people’s input on API for the action. This will probably need to wait till next week as I’m at a physics workshop this week.
Starting creating nightly wheels for several widely-used projects.
Switching matplotlib over should be easy. We’ll probably need to better understand how all the other projects are uploading their wheels to help them switch over to using the GHA.
Investigating whether we can do something similar for PyPI.
Creating a GH action or example script to help downstream projects test against these nightly wheels via a cron job that creates an issue when the tests fail against the nightly wheels.
In most situations people can get away with installing their dependencies like normal and then just pip installing from the index the dependency they want to test. For example, if you want to test the nightly scipy but not the nightly numpy
but if you wanted to test both then you could do something like what is shown in the matplotlib docs (the --extra-index-url in there would be for a situation in which there are dependencies that are not on the scipy-wheels-nightly index)
So we can create some examples for people, but I don’t think a script is really needed as it is just a pip install. Examples I’ve implemented so far:
Opening an Issue on failure might be problematic as it could add noise to situations where problems are understood but just haven’t been resolved yet.