How to format mathematical expressions?

I would like to propose to discuss about styling equations.

It is known that the PEP8 and other established styling documents are missing guidelines about maths. Hence everyone comes with its own interpretation and style. I believe I do not have to motivate the benefits for having a common coding style in general as this is well established (easier to share, code is familiar and coherent across the ecosystem, etc.).

I think such a document is missing from the scientific community and my hope is that we can all agree on something :smiley:

To be transparent: there are heated discussions on SciPy around this topic and formatting tools. Indeed, having extensive guidelines (like PEP8) could allow such tools to implement a mathematical style. Opinion varies here around the feasibility and need of styling maths. I still think this is a good idea and has value.

To quick-start things here are some ideas. DISCLAMER: I do not claim these are correct, this is just to start the discussion. Feel free to rewrite everything. I am just proposing an idea which I believe could help the community in general. I hope we have productive chats here.

Formatting Mathematical Expressions

To format mathematical expressions, the following rules must be followed. These rules respect and complement the PEP8 (relevant sections includes id20and id28)

  • If operators with different priorities are used, add whitespace around the operators with the lowest priority(ies).
  • There is no space before and after **.
  • There is no space before and after operators *,/. Only exception is if the expression consist of a single operator linking two groups.
  • There a space before and after -, +. Except if : (i) the operator is used to define the sign of the number; (ii) the operator is used in a group to mark higher priority.
  • When splitting an equation, new lines should start with the operator linking the previous and next logical block. Single digit, brackets on a line are forbidden. Use the available horizontal space as much as possible.
# Correct:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
dfdx = sign*(-2*x + 2*y + 2)
result = 2 * x**2 + 3 * x**(2/3)
y = 4*x**2 + 2*x + 1
c_i1j = (1./n**2.
         * np.prod(0.5*(2.+abs(z_ij[i1, :])
                        + abs(z_ij) - abs(z_ij[i1, :]-z_ij)), axis=1))
# Wrong:
i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
dfdx = sign * (-2 * x + 2 * y + 2)
result = 2 * x ** 2 + 3 * x ** (2 / 3)
y = 4 * x ** 2 + 2 * x + 1
c_i1j = (1.
         / n ** 2.
         * np.prod(0.5 * (2. + abs(z_ij[i1, :])
                          + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1))

Thanks for kicking this off @tupui. It’d be nice to have something a little more detailed than PEP 8 (like “no spaces around the ** operator” is missing from PEP 8), and hopefully that will help tool authors to improve how they format numerical code.

Most of your Correct/Wrong examples are clear. The exception is c_i1j. I think it’s very difficult to say which version is better, and also hard to create any rules for your “correct” version. For example, in

         1./n**2.
         * np.prod(...

the precedence for / and * is the same, but only one of the two operators has spaces around it.

1 Like

Thanks for raising this issue @tupui!

What Ralf wrote makes me think that there’s some stylistic input into this, making it very hard to automatically format correctly unless you insert spaces everywhere like black does:

i = i + 1
submitted += 1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
dfdx = sign * (-2 * x + 2 * y + 2)
result = 2 * x ** 2 + 3 * x ** (2 / 3)
y = 4 * x ** 2 + 2 * x + 1
c_i1j = (
    1.0
    / n ** 2.0
    * np.prod(
        0.5 * (2.0 + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1
    )
)

So, perhaps the ideal checker would do what @rkern mentioned on the SciPy issue: ensure that PEP8 (or some superset of that) is conformed to, but not make adjustments where that is already the case.

Thank you @rgommers and @stefanv.

Indeed I should have written this

# Correct:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
dfdx = sign*(-2*x + 2*y + 2)
result = 2*x**2 + 3*x**(2/3)
y = 4*x**2 + 2*x + 1
c_i1j = (1./n**2.
         *np.prod(0.5*(2.+abs(z_ij[i1, :])
                       + abs(z_ij) - abs(z_ij[i1, :]-z_ij)), axis=1))

I agree that it would be difficult to, not just code such a system, but also just write using these rules :sweat_smile:For this to work, it has to be simple. I guess I would be personally ok with what Black does without spaces around **,/,* and spaces otherwise. This would be simpler to use as you wouldn’t have to count and check who has the highest priority, etc.

i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a + b)*(a - b)
dfdx = sign*(-2*x + 2*y + 2)
result = 2*x**2 + 3*x**(2/3)
y = 4*x**2 + 2*x + 1
c_i1j = (
    1.0
    /n**2.0
    *np.prod(
        0.5*(2.0 + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1
    )
)

Personally, the only operator that really irks me with a space around it is **.

Writing 2*x feels natural, but in the example above I’d expect * np.prod instead of *np.prod. So, it varies even in my own head on a case-by-case basis.