The discussion hasn’t been settled yet in part because there doesn’t seem to be a “standard” convention in the ecosystem to follow: e.g., workers is used by SciPy while scikit-learn uses n_jobs.
As this seems somewhat similar in concept to SPEC 7 — Seeding pseudo-random number generation I’d be curious what people think of making an ecosystem wide recommendation. In my mind the scope would deal more with naming conventions rather than a full blown “parallel computing API”.
n_jobs may not be the best parameter name but changing it in scikit-learn, joblib, and a number of other projects that adopted the same name, like MNE-Python, imbalanced-learn, xgboost etc … seems like a long-term endeavour and the migration will cause some disruption.
Indeed, @thomasjpfan’s terrific blog post inspired our thinking on this topic. Ultimately, though, we’d rather opt for a workable option that the community is willing to adopt, rather than for the “optimal” naming.
Hi, I’ve been working on nx-parallel(parallel backend for NetworkX) and we had a little discussion on what the default value of n_jobs(or workers) should be.
I’m yet to add a config manager. So right now, a user cannot modify any of the parallel configs but I think we decided to go by the joblib’s conventions(i.e. n_jobs) because nx-parallel is heavily dependent on joblib for all kinds of parallelization stuff, so it would be fair to expect a user to know joblib(if they would want to play around with the parallel configurations in nx-parallel). Also, allowing users to set configs would hopefully allow them to use other parallel backends(like threading, multiprocessing, dask, ray etc.) through joblib, so then, having an ecosystem-wide recommendation would be even more helpful. Also having a SPECs for a whole parallel computing API would be great! And, I’d really like to know how I can contribute to this.