Adding black as linter?

stefanv · October 5, 2022, 10:21pm

I thoroughly enjoy working on projects where we never, ever have to debate style in PRs. A very effective way to achieve that is to use a formatter, such as black.

Now, I know that black is somewhat controversial (because the code it produces isn’t always what we would have written). BUT I think the benefit of not having to worry about formatting ever again is totally worth it.

In PR 6563 I’ve added some innocuous linters, but have left black out on purpose until we’ve had time to discuss.

One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project, but we could also just put it in place to check patches as they roll in. That way, you gradually transition the codebase (@jarrodmillman may have an opinion on which route works best?)

Are there any major concerns from the developer community about adding black as a linter?

P.S. For those of you who haven’t come across pre-commit before, it integrates very nicely into the development process:

pip install pre-commit
git commit ... # pre-commit runs automatically

EXACTLY the same linter is run during CI, so if your commit passes you don’t have to wait for the CI to tell you that you’ve made a formatting mistake (I’m looking at you, SciPy!)

jni · October 6, 2022, 2:51am

I 100% agree with this.

As I mentioned in the NumPy/SciPy debate, I still prefer yapf’s customisability, and for the love of God if we use black can we please set the line length to 79, but either way, my vote is yapf > black >>> nothing.

jni · October 6, 2022, 2:52am

btw @stefanv you mention applying it only to patches but I thought black couldn’t do that?

stefanv · October 6, 2022, 5:34am

I also like yapf’s customizability, but black is community-developed and seems more active. Also, I think the black authors would be willing to work with us on things like math formatting.

I think with modern terminals, 79 is quite restrictive. I’ve had good success with their default choice of 88, and it’s easy to adapt flake8 to respect that. Their rationale:

Black defaults to 88 characters per line, which happens to be 10% over 80. This number was found to produce significantly shorter files than sticking with 80 (the most popular), or even 79 (used by the standard library).

FWIW, I do not have a wide aspect-ratio monitor on my laptop, but I still run two editors side by side very comfortably at 88.

You’re right that it does not do patches, so once you touch a file you have to fix the whole file. But you can do only the files you touched.

lagru · October 6, 2022, 8:38am

I’m +1 on using black. I’ve had very good experiences with it on personal (scientific) projects in the past.

yapf’s customisability might actually be a point against it in my book. It can be one more point to argue about and also might indicate more maintenance compared to black.

There is Darker, which can solve that problem as a drop-in replacement. They also have a very good overview about the discussion (see the projects readme) about adding this feature to black itself.

+1. If we go this way, I think that using the defaults would have the largest benefit and lowest friction for the community, both in terms of using the formatter and reading its output.

Thank you, kicking off this topic again.

jni · October 6, 2022, 9:03am

I can fit three terminals side by side with 80, but not 88.

You know what else produces shorter files? Not spilling over to 100 tiny lines when you have a long parameter list.

Hey we might as well front-load our arguing about style, no?

Anyway, I like the darker idea, and/or maybe we just use black in skimage2?

endolith · October 6, 2022, 12:00pm

I’m opposed to Black; it forces bad formatting. Readability is more important than consistency.

Previous threads:

github.com/scipy/scipy

STY: Maths formatting

opened 06:59PM - 05 Jul 21 UTC

closed 07:50PM - 06 Jul 21 UTC

tupui

query

This issue is linked to #14330 To be able to use tools (like, but not limited… to Black), we need to define how we, as the scientific community and not just SciPy, want mathematical equations to be rendered. The goal of this issue is to document and establish a strict set of rules to write maths. The rules must be coherent, extensive and opinionated (one way to do something, unambiguous wording) so they can be integrated in a tool (that tool may be Black). I think such a document is missing from the scientific community and my hope is that we can all agree on something :smiley: To quickstart things here are some ideas: # Formatting Mathematical Expressions To format mathematical expressions, the following rules must be followed. These rules respect and complement the PEP8 (relevant sections includes [id20](https://www.python.org/dev/peps/pep-0008/#id20)and [id28](https://www.python.org/dev/peps/pep-0008/#id28)) * If operators with different priorities are used, add whitespace around the operators with the lowest priority(ies). * There is no space before and after `**`. * There is no space before and after operators `*,/`. Only exception is if the expression consist of a single operator linking two groups. * There a space before and after `-`, `+`. Except if : (i) the operator is used to define the sign of the number; (ii) the operator is used in a group to mark higher priority. * When splitting an equation, new lines should start with the operator linking the previous and next logical block. Single digit, brackets on a line are forbidden. Use the available horizontal space as much as possible. ```python # Correct: i = i + 1 submitted += 1 x = x*2 - 1 hypot2 = x*x + y*y c = (a+b) * (a-b) dfdx = sign*(-2*x + 2*y + 2) result = 2 * x**2 + 3 * x**(2/3) y = 4*x**2 + 2*x + 1 c_i1j = (1./n**2. * np.prod(0.5*(2.+abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :]-z_ij)), axis=1)) ``` ```python # Wrong: i=i+1 submitted +=1 x = x * 2 - 1 hypot2 = x * x + y * y c = (a + b) * (a - b) dfdx = sign * (-2 * x + 2 * y + 2) result = 2 * x ** 2 + 3 * x ** (2 / 3) y = 4 * x ** 2 + 2 * x + 1 c_i1j = (1. / n ** 2. * np.prod(0.5 * (2. + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1)) ```

github.com/scipy/scipy

MAINT/STY: use Black formatting

scipy:master ← tupui:black_style

opened 12:14PM - 01 Jul 21 UTC

tupui

+115604 -79903

I propose to apply [Black](https://black.readthedocs.io) on our code base. This …was discussed during the last [community meetup](https://hackmd.io/@tupui/scipy-meetup-3). Using Black would remove all discussions about code style in PRs. It would be the entire responsibility of Black to format code, hence make our code base very consistent and remove the extra load due to styling (both in writing code and reviewing). **I am labelling this PR as in need of a decision. I will wait for comments before doing more work.** This PR does: - [x] Add a pre-commit hooks for Black and `flake8`. The goal is to remove any churn on the development side. Just commit and it fixes the code for you, no questions asked. - [x] black configuration is in `pyproject.toml` and `flake8` ones in `tox.ini`. - [x] Make the max line length 88, Black's default. - [x] Run Black on the whole code base. - [ ] Add Black and `flake8` in CI (where?). - [ ] Add isort (still not sure, maybe latter as this could break things with circular dependencies in some places). - [ ] Remove `pycodestyle` from CI and `tox.ini`. - [ ] Document Black use in the developer guides. ## Concerns ### Blame We can mark the commit which applied Black to an exception list `.git-blame-ignore-revs`, see [here](https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html#avoiding-ruining-git-blame), so that `git` does not see the commit and does not pollute the output of `git blame`. This only works locally as GitHub does not support (yet) this functionality. ### Maths There are maths concerns with operators. For instance these examples: https://github.com/tupui/scipy/blob/black_style/scipy/special/_basic.py#L1154 ```python fac2 = (-1.0) ** (n + 1) * gamma(n + 1.0) * zeta(n + 1, x) ``` https://github.com/tupui/scipy/blob/black_style/scipy/optimize/tests/test_slsqp.py#L67 ```python dfdx = sign * (-2 * x + 2 * y + 2) # vs dfdx = sign*(-2*x + 2*y + 2) ``` My personal view on this: our current maths is not consistent on that, thus I prefer to pay this price (having more spacing) in favour of consistency. With Black, we remove all discussions around styling once and for all. Scikit-Learn did it (cc @thomasjpfan, @ogrisel), why not us. I would also prefer to avoid having to fork Black to adjust such things. For `**`, there is an open issue/PR https://github.com/psf/black/issues/538 , so it could use some discussion to make it happen. Spacing around single operator is less likely to happen, but who knows... (cc @ambv) ### Skip Black In case you really want to format something in a specific way, there is still the possibility to mark the code block, see the [doc](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#code-style). ### Merge conflicts There are a few recipes out there to help fix merge conflicts with open PRs. We can link these in the doc. https://github.com/scikit-learn/scikit-learn/issues/20301 https://github.com/spyder-ide/spyder/pull/15886 ## References Design choice, why the sad face https://lukasz.langa.pl/1d1a43c4-9c8a-4c5f-a366-7f22ce6a49fc/ Sklearn is now using Black https://github.com/scikit-learn/scikit-learn/pull/18948 Django proposal for Black https://github.com/django/deps/blob/main/accepted/0008-black.rst

https://mail.python.org/pipermail/scipy-dev/2021-July/024924.html

lagru · October 6, 2022, 1:41pm

Taking a step back I’d just like to reflect that these kind of assessments are a degree more subjective and the result of personal workflow, experience and bias than other arguments. And are therefore very likely to spark emotional responses. E.g. I’d say consistency enhances readability and some part of me is upset about seeing something described as “bad formatting” what I view as “good formatting”. We probably make this harder for ourselves if we focus on these kind of things.

Therefore, I would really like to focus the discussion on less “personal workflow” aspects to see where we can find common ground. E.g.

Status quo: we are expending energy on conflicts (whether silent or visible) around formatting style that we would like to use somewhere else.
Applying a code formatter will force a style that not everyone will be happy with. I want to highlight that this is already is the status quo anyway, e.g. while reviewing PRs we either decide to cope with formatting styles we would not use ourselves or we are forcing our opinion onto others. In some cases someone might not be happy (note that maintainers are used to a position of power here). A formatter will shift this around.
The applied style should create the least possible surprise and friction for the community as a whole.
black/darker has some flexibility but less than yapf. Is this a good or bad thing?
Keeping the diff small is a worthy goal everyone can agree on.
black’s AST checking is useful in increasing trust in the formatted output
Good integration into most common workflows is a must, e.g. a commit hook. Both yapf and black/darker support this.

I’d also like to point out Formatting Code with Black - APE20. I’m not sure whom to ping from the astropy community but they might have some useful experience with it already.

Also, I’d be willing to start and work on a SKIP for this.

That’s bound to happen and still a win in my book if we finally find a solution for this argument long-term.

endolith · October 6, 2022, 3:33pm

You have used Black for scientific computing? It doesn’t seem to have been designed for this purpose, forcing arrays and math to be less readable, for instance.

The consensus of previous discussions was that there is no such problem:

Is there evidence that code formatting is a problem? What fraction of scipy, numpy, scikit-xxx PRs have had significant discussions (let alone “controversies”) about Python formatting? How many of those are not resolved by “let’s be sensible and mostly follow PEP8 when we can”?

In my experience, there aren’t many bikeshedding arguments, per se, (“I like it this way!”, “But I like it that way!”). The main review overhead is just getting the style into compliance with the existing guidelines. black’s distinguishing feature is, as you say, more about resolving the former.

AFAICT, we largely do not have endless discussions about styling in the actual code reviews. We only have endless discussions about styling when someone proposes to use black.

and even if there was, Black is not a good solution:

However, as many people quickly found out in the past (including us, at my company, after using it about 4 months) is that this standard is not written by scientific or number crunching people.

What black does today for math is bad, really bad. Something like hypot2 = 2 * x + 3 * y ** 2 is code no numerical Python person would write by hand.

stefanv · October 6, 2022, 4:10pm

@endolith Arguments about formatting come up all the time. Worse, many reviewers feel the need to make newcomer PRs conform to their formatting preferences, which wastes time and discourages contribution.

On all the projects I’ve worked on that use a formatter, these issues have simply evaporated.

I understand the unhappiness with black’s math formatting. It’s not good. There is another topic on this forum trying to identify what feedback to give to the black developers, and your input would be very much appreciated there.

Now, the question here is whether we can agree on some (any) system to use for automated formatting. It doesn’t need to be black, so if you happen to know yapf flags or another tool that will get us closer to what you’d like to see, there’s no reason we can’t use that instead.

Finally, I’d say that I am less devoted to the idea of getting a formatter in than I am into the pre-commit checks in my existing PR that will make things consistent and correctly formatted.

@lagru Thanks for bringing the conversation back to principles. Sorry for joining in the bike-shedding

mkcor · October 6, 2022, 6:20pm

To refresh my memory, I was just looking up this old comment by @jni

Anyway, my vote will be @jni’s vote

keewis · October 6, 2022, 6:22pm

One unfortunate side-effect is that black will make many older PRs unmergeable if applied wholesale to the project

It is possible to work around that: when we introduced black into the code base of xarray, we added a section to the contributing guide that described the process (see e.g. pydata/xarray#3195 and psf/black#967). As a summary:

prerequesite: the changes from the autoformatter are contained in a single commit (not sure if that’s actually necessary, but I guess it makes git blame a bit easier to use)
old PRs can then be updated by these steps:
1. merge the commit immediately before that commit
2. run the autoformatter on the PR (e.g. black . in the project root)
3. merge the autoformatter commit, resolving with the ours strategy: git merge <commit> -X ours
4. merge main

(the instructions from that PR also mention applying a set of other changes, but that’s because we made the mistake of adding other changes to that single commit.)

From what I remember, that process was pretty painless, and as maintainers you’re usually able to help contributors by pushing to their PR.

Edit: dask and pandas also have some experience with that transition

stefanv · October 6, 2022, 6:45pm

@keewis Thanks, that’s helpful! Did you do this manually for all open PRs?

@mkcor FWIW, we can disable quote enforcement with black. See --skip-string-normalization.

keewis · October 6, 2022, 7:05pm

Did you do this manually for all open PRs?

well, I was not a maintainer back then so I don’t know if I missed something, but no, I don’t think so. As far as I remember all that was done was to modify the PR template to point towards that guide, and then the contributors did the transition (and I actually don’t remember any PRs where someone needed help, but I might just have forgotten).

In fact, I think we still have some open (and maybe abandoned / superseded, not sure) PRs that didn’t do the transition yet.

tgross35 · October 7, 2022, 6:35pm

Let me link my PR to add some formatting options to Numpy via pre-commit: DEV: Add pre-commit hook to apply pep7 (c/cpp) and pep8 (py) to only changed lines of code by tgross35 · Pull Request #21449 · numpy/numpy · GitHub this includes black, darker, clang-format, flake8, and formatters for other files

Basically it’s been been fairy well received as an optional thing, but kind of moved into limbo when the possibility of making formatting required was brought up. I haven’t poked it in a while, don’t believe triage has come to any final conclusions though

taldcroft · October 12, 2022, 11:33am

Astropy is still in the planning stages for the transition to using Black, but you can find some discussion at the link below. This includes a plan to run Black on the previous release branches (our LTS branch in particular) along with applying Black for the upcoming v5.2 release at the end of October.

github.com/astropy/astropy

APE 20: Of black and backport

opened 04:26PM - 04 Oct 22 UTC

closed 01:58PM - 01 Nov 22 UTC

pllim

Release needs-discussion

In [APE 20](https://github.com/astropy/astropy-APEs/blob/main/APE20.rst), an imp…lementation plan was laid out and the target was for v5.2. Since we are still on the hook to support v5.0.x as LTS until end of 2023 (see [APE 2](https://github.com/astropy/astropy-APEs/blob/main/APE2.rst) and [Release Calendar](https://github.com/astropy/astropy/wiki/Release-Calendar)), there is the matter of how do we backport to v5.0.x branch in a sane way after this implementation for v5.2. Some ideas: * We also apply `isort` and `black` to v5.0.x branch to keep it in sync enough for the auto-backport bot to still work. * Concern: While the automation part should be relatively painless, any manual edit that has to go in risk actually breaking the code that we have promised would be "long-term stable." * We keep v5.0.x as-is but only limit manual backport to critical fixes in the event that auto-backport fails for any reason. See existing LTS policy at https://docs.astropy.org/en/latest/lts_policy.html . This issue was a result of 2022-10-04 dev telecon discussions. cc @saimn @dhomeier @eerovaher @WilliamJamieson @braingram Also cc @astropy/astropy-project-release-team

stefanv · October 12, 2022, 5:42pm

Thanks, Tom. Did astropy ever discuss mathematical formatting? Black seems to be open to suggestions on how to make this better.

lagru · March 17, 2023, 9:41am

Comment by lagru to Don't ignore E501 (line length) and E712 with ruff

scikit-image:main ← lagru:stricter-ruff

> wait, why did you give up? Right now ruff is applied to `--all-files` and i…f we enforce the line length, we would need to clean up around 170 lines for the more relaxed 88 limit. So I looked to apply it incrementally which I expected to be less controversial. Finding the diff is easy enough locally though in the pipeline it's more difficult. Depending on the trigger, `pull_request`, `push` `merge_group` different information is available to reconstruct either the diff of a pull request or. [With the merge queue disabled for now](https://discuss.scientific-python.org/t/660/4), this is a lot easier to do. Anyway, I think I figured out that part. I gave up when I noticed that we are currently only fetching 1 commit. So the `git diff base_ref...feature_ref` invoked by pre-commit doesn't work. For pull requests we need to eiter determine this fetch depth dynamically, fetch all commits (expensive?) or choose a number we expect to work in all cases. At this point I just wasn't sure that we want this badly enough. Furthermore, the granularity of of this `git diff --names-only` based approach is on files, so we might raise errors somewhere else in a file that was only briefly touched in a PR. So the incremental nature is diminished somewhat. I'm happy to enforce this all at once with ruff, and fix the 170 over time by hand. Though, I'd rather put our time into setting up black and be done with these kind of things. :roll_eyes: :sweat_smile:

stefanv · May 10, 2023, 8:41pm

Ruff has indicated that it will soon be doing auto-formatting.

Two other systems that are essentially black+lightweight configuration are:

I want to be clear that I don’t care which tool we use, but that I am very much in favor of choosing some tool.

lagru · May 11, 2023, 1:22pm

Same. And also +1 on using any formatter. How do we get this forward? I don’t really expect a consensus at the community level. But there hasn’t been any push back in this thread from maintainers themselves, so I feel like consensus at among the team is possible.

Our governance doc doesn’t make this entirely clear to me whether we need a SKIP here or can get by with lazy consensus. I’d be willing to helm this effort and create a SKIP for this or work on the implementation itself. I’d probably follow astropy’s APE20 (accepted) on this where it makes sense…

Thoughts?