Best Practices for Optimizing NumPy Performance in Scientific Python Projects!

ev-br · December 23, 2024, 11:38am

It’s not very helpful to start looking into details in the abstract, until you have a working prototype. There are too many techniques for too many use cases and too many specific problems, and bogging down into details prematurely is a great time sink.

From experience optimizing python and other scientific workflows, I’d recommend roughly this:

Have a prototype, however rough. Don’t worry about performance just yet. If something is blatantly obvious (like, exponential vs linear complexity), sure use the knowledge; otherwise, don’t bother yet.
Make sure it works as expected. Collect a set of validation examples: asymptotics, limiting cases, known values, expected results—something you check with that your prototype is not entirely incorrect. Depending on details, you may want to roughly separate this into two buckets: small, quick to run examples where you check against accurate results, and longer, heavier runs which need to be looked at by a human eye. (For example from some of my past workflows: am building a quantum MC simulation; checking against an exactly solvable small system is the first one; checking that the error scales roughly as 1/\sqrt{number_of_steps} is the second one).
Make this collection into something semi-automated if you can. Don’t get bogged down with fine details of acceptance testing vs unit testing vs whatnot — if you can reasonably make your acceptance suite run with a single command, great; if it’s a collection of scripts you run manually, also OK. You’ll refine the framework as you go.
Having constructed this set of examples, you have a rough idea of what’s bad in your prototype. Now time to turn a rough idea into data----start profiling. This is key. You need data.
If your workflow involves disk or network or large memory or databases — do it spend time in IO or number crunching? If the latter, does it fit into memory or you start swapping?
At this stage things start depending on details, but the general idea is — identify a bottleneck, work on it and ignore the rest.
At any rate, you need to profile. If your application is in python, just use the standard library cProfile module as a starting point; Once you know where the bottleneck is at the function level, sometimes it’s useful to throw in line_profiler which will point you to specific lines of code. Is there a part which dominates the profile? Great, eliminate the bottleneck — maybe it’s better numpy vectorization; if that doesn’t work, maybe you’ll need a compiled extension; if you’re running out of memory, you’ll need to think about parallelizing — but the main point stands: only consider a bottleneck, as shown by the profiler.
Once you’ve eliminated the bottleneck (again, profiler will tell you), rerun the acceptance tests. Once your rewrite is correct: is the current state acceptable? If yes, you’re done. Just stop optimizing. If not, goto 6.

I know what I’m saying is kind of vague. It’s because specific details of what to optimize and how to optimize are very very very problem specific, and there’s no point dwelling on solutions to non-problems or somebody-else-problems.
Once you’re down to a specific bottleneck, we might be able to offer more focused suggestions.

To summarize: have a prototype, have acceptance tests, use a profiler to identify bottlenecks, and iterate until the result is acceptable.
Oh and, do use some form of version control to keep track of iterations.

HTH,

Evgeni