I thoroughly enjoy working on projects where we never, ever have to debate style in PRs. A very effective way to achieve that is to use a formatter, such as black.
Now, I know that black is somewhat controversial (because the code it produces isn’t always what we would have written). BUT I think the benefit of not having to worry about formatting ever again is totally worth it.
In PR 6563 I’ve added some innocuous linters, but have left black out on purpose until we’ve had time to discuss.
One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project, but we could also just put it in place to check patches as they roll in. That way, you gradually transition the codebase (@jarrodmillman may have an opinion on which route works best?)
Are there any major concerns from the developer community about adding black as a linter?
P.S. For those of you who haven’t come across pre-commit before, it integrates very nicely into the development process:
pip install pre-commit
git commit ... # pre-commit runs automatically
EXACTLY the same linter is run during CI, so if your commit passes you don’t have to wait for the CI to tell you that you’ve made a formatting mistake (I’m looking at you, SciPy!)
As I mentioned in the NumPy/SciPy debate, I still prefer yapf’s customisability, and for the love of God if we use black can we please set the line length to 79, but either way, my vote is yapf > black >>> nothing.
I also like yapf’s customizability, but black is community-developed and seems more active. Also, I think the black authors would be willing to work with us on things like math formatting.
I think with modern terminals, 79 is quite restrictive. I’ve had good success with their default choice of 88, and it’s easy to adapt flake8 to respect that. Their rationale:
Black defaults to 88 characters per line, which happens to be 10% over 80. This number was found to produce significantly shorter files than sticking with 80 (the most popular), or even 79 (used by the standard library).
FWIW, I do not have a wide aspect-ratio monitor on my laptop, but I still run two editors side by side very comfortably at 88.
You’re right that it does not do patches, so once you touch a file you have to fix the whole file. But you can do only the files you touched.
Taking a step back I’d just like to reflect that these kind of assessments are a degree more subjective and the result of personal workflow, experience and bias than other arguments. And are therefore very likely to spark emotional responses. E.g. I’d say consistency enhances readability and some part of me is upset about seeing something described as “bad formatting” what I view as “good formatting”. We probably make this harder for ourselves if we focus on these kind of things.
Therefore, I would really like to focus the discussion on less “personal workflow” aspects to see where we can find common ground. E.g.
Status quo: we are expending energy on conflicts (whether silent or visible) around formatting style that we would like to use somewhere else.
Applying a code formatter will force a style that not everyone will be happy with. I want to highlight that this is already is the status quo anyway, e.g. while reviewing PRs we either decide to cope with formatting styles we would not use ourselves or we are forcing our opinion onto others. In some cases someone might not be happy (note that maintainers are used to a position of power here). A formatter will shift this around.
The applied style should create the least possible surprise and friction for the community as a whole.
black/darker has some flexibility but less than yapf. Is this a good or bad thing?
Keeping the diff small is a worthy goal everyone can agree on.
black’s AST checking is useful in increasing trust in the formatted output
Good integration into most common workflows is a must, e.g. a commit hook. Both yapf and black/darker support this.
I’d also like to point out Formatting Code with Black - APE20. I’m not sure whom to ping from the astropy community but they might have some useful experience with it already.
Also, I’d be willing to start and work on a SKIP for this.
That’s bound to happen and still a win in my book if we finally find a solution for this argument long-term.
You have used Black for scientific computing? It doesn’t seem to have been designed for this purpose, forcing arrays and math to be less readable, for instance.
The consensus of previous discussions was that there is no such problem:
Is there evidence that code formatting is a problem? What fraction of scipy, numpy, scikit-xxx PRs have had significant discussions (let alone “controversies”) about Python formatting? How many of those are not resolved by “let’s be sensible and mostly follow PEP8 when we can”?
In my experience, there aren’t many bikeshedding arguments, per se, (“I like it this way!”, “But I like it that way!”). The main review overhead is just getting the style into compliance with the existing guidelines. black’s distinguishing feature is, as you say, more about resolving the former.
AFAICT, we largely do not have endless discussions about styling in the actual code reviews. We only have endless discussions about styling when someone proposes to use black.
and even if there was, Black is not a good solution:
However, as many people quickly found out in the past (including us, at my company, after using it about 4 months) is that this standard is not written by scientific or number crunching people.
What black does today for math is bad, really bad. Something like hypot2 = 2 * x + 3 * y ** 2 is code no numerical Python person would write by hand.
@endolith Arguments about formatting come up all the time. Worse, many reviewers feel the need to make newcomer PRs conform to their formatting preferences, which wastes time and discourages contribution.
On all the projects I’ve worked on that use a formatter, these issues have simply evaporated.
I understand the unhappiness with black’s math formatting. It’s not good. There is another topic on this forum trying to identify what feedback to give to the black developers, and your input would be very much appreciated there.
Now, the question here is whether we can agree on some (any) system to use for automated formatting. It doesn’t need to be black, so if you happen to know yapf flags or another tool that will get us closer to what you’d like to see, there’s no reason we can’t use that instead.
Finally, I’d say that I am less devoted to the idea of getting a formatter in than I am into the pre-commit checks in my existing PR that will make things consistent and correctly formatted.
@lagru Thanks for bringing the conversation back to principles. Sorry for joining in the bike-shedding
One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project
It is possible to work around that: when we introduced black into the code base of xarray, we added a section to the contributing guide that described the process (see e.g. pydata/xarray#3195 and psf/black#967). As a summary:
prerequesite: the changes from the autoformatter are contained in a single commit (not sure if that’s actually necessary, but I guess it makes git blame a bit easier to use)
old PRs can then be updated by these steps:
merge the commit immediately before that commit
run the autoformatter on the PR (e.g. black . in the project root)
merge the autoformatter commit, resolving with the ours strategy: git merge <commit> -X ours
(the instructions from that PR also mention applying a set of other changes, but that’s because we made the mistake of adding other changes to that single commit.)
From what I remember, that process was pretty painless, and as maintainers you’re usually able to help contributors by pushing to their PR.
Edit: dask and pandas also have some experience with that transition
well, I was not a maintainer back then so I don’t know if I missed something, but no, I don’t think so. As far as I remember all that was done was to modify the PR template to point towards that guide, and then the contributors did the transition (and I actually don’t remember any PRs where someone needed help, but I might just have forgotten).
In fact, I think we still have some open (and maybe abandoned / superseded, not sure) PRs that didn’t do the transition yet.
Basically it’s been been fairy well received as an optional thing, but kind of moved into limbo when the possibility of making formatting required was brought up. I haven’t poked it in a while, don’t believe triage has come to any final conclusions though
Astropy is still in the planning stages for the transition to using Black, but you can find some discussion at the link below. This includes a plan to run Black on the previous release branches (our LTS branch in particular) along with applying Black for the upcoming v5.2 release at the end of October.