Proposal: speed up scipy.stats.quantile by using an unstable sort

cakedev0 · October 16, 2025, 3:16pm

In PR #23782, we switch the internal sort used by scipy.stats.quantile from a stable sort to an unstable one (stable=False). A stable sort isn’t required here because the quantile depends only on the multiset of values and their ranks; the relative order of ties does not affect the result. This aligns with NumPy’s use of unstable selection for quantiles (via partition).

Benefit (performance): on large inputs this change yields substantial speedups. Example ASV-style timings provided in the PR comments for random arrays show ~2× faster at 10k elements, and ~5-6× at 100k-1M elements.

Behavioral risk (sign of zero only): outputs are numerically identical, but when the input contains both -0. and 0., some returned quantiles that are exactly zero may flip sign from -0. to 0. (or vice versa) due to tie ordering differences. This affects only the sign bit of zero, not the value. An explicit example is included in the PR comments.

If there are no concerns about the sign-of-zero change, the plan (per discussion) is to proceed after a short window for feedback, and add a benchmark to track the improvement.

jorenham · October 18, 2025, 1:44pm

Stable sort tends to be quicker if the data is already sorted I believe, so there’s a bit of a trade-off here.

In practice, data is almost never 100% random, so assuming that you’re talking about artificially generated random numbers here, then this benchmark might not be an accurate reflection of potential performance gains that we can expect with this.

There are some datasets available that are specifically intended for this purpose. I’m not sure which one would be most appropriate in this case though.

cakedev0 · October 18, 2025, 11:32pm

Good point. I wasn’t aware of that, indeed stable sort (timsort) is ~10x faster than unstable sort (introsort) on ordered data (basically, it’s O(n) in this case).

But I still think unstable sort is a better option: it’s always the same speed independently of the data (except for “adversarial” input, but I don’t think we care about that), and it’s fast.

IMO, being always fast (~10ms for 1M float64 items) is better than being sometimes super fast (1.5ms) and sometimes slow (~60ms).

In practice, data is almost never 100% random

No indeed, but I think it’s still often “quite random” in terms of ordering. And for that, numpy’s unstable sort will likely be much faster than stable sort. Partly because numpy’s unstable sort is super optimized (SIMD).

Also, note that numpy’s quantile is based on introselect, so we’re “matching” that if we go with introsort.