Deprecate one-argument use of `stats.linregress`

mdhaber · May 24, 2024, 11:07pm

Hi team,

scipy.stats.linregress is currently very flexible about how the independent and dependent variables are specified.

x, y : array
Two sets of measurements. Both arrays should have the same length. If only x is given (and y=None ), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension. In the case where y=None and x is a 2x2 array, linregress(x) is equivalent to linregress(x[0], x[1]) .

If we want to add an axis argument and array-API support, this would get even more confusing, so I’d propose deprecating the one-argument use of stats.linregress and requiring that the user pass x and y as separate arguments. To implement this, the stats.linregress would split from the stats.mstats.linregress implementation, and we’d deprecate use of masked arrays in the stats version at the same time. I’ll open a PR for this shortly; in the meantime, thanks for your thoughts!

Matt

stefanv · May 24, 2024, 11:22pm

+1 to any such changes that make the API calls more explicit, and reduce the amount of “if-this-then-that” explanation we have to do in docstrings.

steppi · May 27, 2024, 3:46pm

+1. It doesn’t seem worth the complexity of maintaining two separate APIs within one function just to save users the trouble of having to do stats.linregress(x[:, 0], x[:, 1]).