Adding black as linter?

I thoroughly enjoy working on projects where we never, ever have to debate style in PRs. A very effective way to achieve that is to use a formatter, such as black.

Now, I know that black is somewhat controversial (because the code it produces isn’t always what we would have written). BUT I think the benefit of not having to worry about formatting ever again is totally worth it.

In PR 6563 I’ve added some innocuous linters, but have left black out on purpose until we’ve had time to discuss.

One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project, but we could also just put it in place to check patches as they roll in. That way, you gradually transition the codebase (@jarrodmillman may have an opinion on which route works best?)

Are there any major concerns from the developer community about adding black as a linter?

P.S. For those of you who haven’t come across pre-commit before, it integrates very nicely into the development process:

  1. pip install pre-commit
  2. git commit ... # pre-commit runs automatically

EXACTLY the same linter is run during CI, so if your commit passes you don’t have to wait for the CI to tell you that you’ve made a formatting mistake (I’m looking at you, SciPy!)

6 Likes

I 100% agree with this.

As I mentioned in the NumPy/SciPy debate, I still prefer yapf’s customisability, and for the love of God if we use black can we please set the line length to 79, but either way, my vote is yapf > black >>> nothing.

1 Like

btw @stefanv you mention applying it only to patches but I thought black couldn’t do that?

I also like yapf’s customizability, but black is community-developed and seems more active. Also, I think the black authors would be willing to work with us on things like math formatting.

I think with modern terminals, 79 is quite restrictive. I’ve had good success with their default choice of 88, and it’s easy to adapt flake8 to respect that. Their rationale:

Black defaults to 88 characters per line, which happens to be 10% over 80. This number was found to produce significantly shorter files than sticking with 80 (the most popular), or even 79 (used by the standard library).

FWIW, I do not have a wide aspect-ratio monitor on my laptop, but I still run two editors side by side very comfortably at 88.

You’re right that it does not do patches, so once you touch a file you have to fix the whole file. But you can do only the files you touched.

I’m +1 on using black. I’ve had very good experiences with it on personal (scientific) projects in the past.

yapf’s customisability might actually be a point against it in my book. It can be one more point to argue about and also might indicate more maintenance compared to black.

There is Darker, which can solve that problem as a drop-in replacement. They also have a very good overview about the discussion (see the projects readme) about adding this feature to black itself.

+1. If we go this way, I think that using the defaults would have the largest benefit and lowest friction for the community, both in terms of using the formatter and reading its output.

Thank you, kicking off this topic again.

I can fit three terminals side by side with 80, but not 88.

You know what else produces shorter files? Not spilling over to 100 tiny lines when you have a long parameter list. :stuck_out_tongue_winking_eye:

Hey we might as well front-load our arguing about style, no? :stuck_out_tongue_winking_eye:

Anyway, I like the darker idea, and/or maybe we just use black in skimage2?

I’m opposed to Black; it forces bad formatting. Readability is more important than consistency.

Previous threads:

https://mail.python.org/pipermail/scipy-dev/2021-July/024924.html

2 Likes

Taking a step back I’d just like to reflect that these kind of assessments are a degree more subjective and the result of personal workflow, experience and bias than other arguments. And are therefore very likely to spark emotional responses. E.g. I’d say consistency enhances readability and some part of me is upset about seeing something described as “bad formatting” what I view as “good formatting”. :wink: We probably make this harder for ourselves if we focus on these kind of things.

Therefore, I would really like to focus the discussion on less “personal workflow” aspects to see where we can find common ground. E.g.

  • Status quo: we are expending energy on conflicts (whether silent or visible) around formatting style that we would like to use somewhere else.
  • Applying a code formatter will force a style that not everyone will be happy with. I want to highlight that this is already is the status quo anyway, e.g. while reviewing PRs we either decide to cope with formatting styles we would not use ourselves or we are forcing our opinion onto others. In some cases someone might not be happy (note that maintainers are used to a position of power here). A formatter will shift this around.
  • The applied style should create the least possible surprise and friction for the community as a whole.
  • black/darker has some flexibility but less than yapf. Is this a good or bad thing?
  • Keeping the diff small is a worthy goal everyone can agree on.
  • black’s AST checking is useful in increasing trust in the formatted output
  • Good integration into most common workflows is a must, e.g. a commit hook. Both yapf and black/darker support this.

I’d also like to point out Formatting Code with Black - APE20. I’m not sure whom to ping from the astropy community but they might have some useful experience with it already.

Also, I’d be willing to start and work on a SKIP for this.

That’s bound to happen and still a win in my book if we finally find a solution for this argument long-term. :sweat_smile:

1 Like

You have used Black for scientific computing? It doesn’t seem to have been designed for this purpose, forcing arrays and math to be less readable, for instance.

The consensus of previous discussions was that there is no such problem:

Is there evidence that code formatting is a problem? What fraction of scipy, numpy, scikit-xxx PRs have had significant discussions (let alone “controversies”) about Python formatting? How many of those are not resolved by “let’s be sensible and mostly follow PEP8 when we can”?

In my experience, there aren’t many bikeshedding arguments, per se, (“I like it this way!”, “But I like it that way!”). The main review overhead is just getting the style into compliance with the existing guidelines. black’s distinguishing feature is, as you say, more about resolving the former.

AFAICT, we largely do not have endless discussions about styling in the actual code reviews. We only have endless discussions about styling when someone proposes to use black.

and even if there was, Black is not a good solution:

However, as many people quickly found out in the past (including us, at my company, after using it about 4 months) is that this standard is not written by scientific or number crunching people.

What black does today for math is bad, really bad. Something like hypot2 = 2 * x + 3 * y ** 2 is code no numerical Python person would write by hand.

2 Likes

@endolith Arguments about formatting come up all the time. Worse, many reviewers feel the need to make newcomer PRs conform to their formatting preferences, which wastes time and discourages contribution.

On all the projects I’ve worked on that use a formatter, these issues have simply evaporated.

I understand the unhappiness with black’s math formatting. It’s not good. There is another topic on this forum trying to identify what feedback to give to the black developers, and your input would be very much appreciated there.

Now, the question here is whether we can agree on some (any) system to use for automated formatting. It doesn’t need to be black, so if you happen to know yapf flags or another tool that will get us closer to what you’d like to see, there’s no reason we can’t use that instead.

Finally, I’d say that I am less devoted to the idea of getting a formatter in than I am into the pre-commit checks in my existing PR that will make things consistent and correctly formatted.

@lagru Thanks for bringing the conversation back to principles. Sorry for joining in the bike-shedding :slight_smile:

2 Likes

To refresh my memory, I was just looking up this old comment by @jni :stuck_out_tongue_winking_eye:

Anyway, my vote will be @jni’s vote :relaxed:

One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project

It is possible to work around that: when we introduced black into the code base of xarray, we added a section to the contributing guide that described the process (see e.g. pydata/xarray#3195 and psf/black#967). As a summary:

  • prerequesite: the changes from the autoformatter are contained in a single commit (not sure if that’s actually necessary, but I guess it makes git blame a bit easier to use)
  • old PRs can then be updated by these steps:
    1. merge the commit immediately before that commit
    2. run the autoformatter on the PR (e.g. black . in the project root)
    3. merge the autoformatter commit, resolving with the ours strategy: git merge <commit> -X ours
    4. merge main

(the instructions from that PR also mention applying a set of other changes, but that’s because we made the mistake of adding other changes to that single commit.)

From what I remember, that process was pretty painless, and as maintainers you’re usually able to help contributors by pushing to their PR.

Edit: dask and pandas also have some experience with that transition

@keewis Thanks, that’s helpful! Did you do this manually for all open PRs?

@mkcor FWIW, we can disable quote enforcement with black. See --skip-string-normalization.

Did you do this manually for all open PRs?

well, I was not a maintainer back then so I don’t know if I missed something, but no, I don’t think so. As far as I remember all that was done was to modify the PR template to point towards that guide, and then the contributors did the transition (and I actually don’t remember any PRs where someone needed help, but I might just have forgotten).

In fact, I think we still have some open (and maybe abandoned / superseded, not sure) PRs that didn’t do the transition yet.

Let me link my PR to add some formatting options to Numpy via pre-commit: DEV: Add pre-commit hook to apply pep7 (c/cpp) and pep8 (py) to only changed lines of code by tgross35 · Pull Request #21449 · numpy/numpy · GitHub this includes black, darker, clang-format, flake8, and formatters for other files

Basically it’s been been fairy well received as an optional thing, but kind of moved into limbo when the possibility of making formatting required was brought up. I haven’t poked it in a while, don’t believe triage has come to any final conclusions though

Astropy is still in the planning stages for the transition to using Black, but you can find some discussion at the link below. This includes a plan to run Black on the previous release branches (our LTS branch in particular) along with applying Black for the upcoming v5.2 release at the end of October.

1 Like

Thanks, Tom. Did astropy ever discuss mathematical formatting? Black seems to be open to suggestions on how to make this better.

2 Likes

Related

Ruff has indicated that it will soon be doing auto-formatting.

Two other systems that are essentially black+lightweight configuration are:

I want to be clear that I don’t care which tool we use, but that I am very much in favor of choosing some tool.

2 Likes

Same. And also +1 on using any formatter. How do we get this forward? I don’t really expect a consensus at the community level. But there hasn’t been any push back in this thread from maintainers themselves, so I feel like consensus at among the team is possible.

Our governance doc doesn’t make this entirely clear to me whether we need a SKIP here or can get by with lazy consensus. I’d be willing to helm this effort and create a SKIP for this or work on the implementation itself. I’d probably follow astropy’s APE20 (accepted) on this where it makes sense…

Thoughts?

1 Like