What are some best practices for leveraging NumPy’s built-in functions to minimize looping in Python: ??
I’d suggest the following: insist that embarrasingly parallel operations can be written without loops.
There are exceptions in both directions, and you may find cases where the hoops you need to jump through to write code without loops are not worth it (either due to performance or code complexity). But I think a good starting point is to assume that vectorization is possible and work until you have a solution to test test rather than jumping to the conclusion that it’s not worth the trouble.
One non-trivial example that comes to mind is "Efficiently calculate angle between three points over triplets of rows in a numpy array".
Some things to learn about that help find vectorized solutions:
- broadcasting
- indexing and
take_along_axis
/put_along_axis
- ufunc methods (e.g.
accumulate
) einsum
triu_indices
/tril_indices