Return type change to `obrientransform` to facilitate array API support

scipy.stats.obrientransform accepts an arbitrary number of input arrays, performs an independent elementwise transform on each, and returns a single array containing all the transformed data. This is problematic when the input arrays do not have the same length - the result is a ragged object array. Besides this, returning an array of arrays does not seem convenient compared to the alternatives - the result for a single array of shape (n,) is (1, n), and some array API backends would not be able to unpack the result (e.g. for the primary use case - pass the outputs as separate arguments to f_oneway).

As written, gh-24393 proposes to change this behavior by simply returning a tuple of arrays instead of an array of arrays. However, there are some issues to discuss:

  • Currently, the PR would emit a FutureWarning about the coming change, and the only way to silence it (other than filtering) would be to adopt the array API behavior by setting SCIPY_ARRAY_API=1. Since the change is unlikely to affect common use cases, would it be less disruptive to just make the change immediately and document it in “Backward incompatible changes” in the release notes?
  • There is no computational advantage to accepting multiple inputs; each input is transformed independently. Should the function just accept a single array and return a transformed array of the same shape? This would facilitate the addition of an axis argument, which would allow multiple slices of the same length to be processed in a vectorized way.

Please join the discussion in gh-24393!

Thanks,
Matt Haberland

1 Like