Hi all,
I have an open numpy PR (#31261) that adds maxlag and lags keyword parameters to np.correlate and np.convolve, allowing users to compute correlation/convolution at only a subset of lags rather than the full set. The motivation for this feature is that there are great performance improvements when the set of lags of interest is much smaller than the set of lags corresponding to any of the existing modes, as is frequently the case when analyzing long time series. There is also a new companion function np.correlation_lags() analogous to scipy.signal.correlation_lags() to allow a user to generate an array of the lags that correspond to the cross-correlation/convolution they calculated.
This has been discussed before for both numpy and scipy. In particular, there is an open scipy issue #4940 discussing the utility of this functionality. At the time, there was not yet auto-routing of correlation calculation between 'direct’ and 'fft’ methods, which there now is. The option of fft calculation of correlation in scipy makes the performance improvements of custom lags less pressing than it is in numpy, where fft is not available as a method of calculation. This improvement would still help for the 'direct’ method of calculation. In addition, there seemed to be an interest in making sure that the function signatures were compatible between numpy and scipy (notwithstanding the existing discrepancy in default mode between the two library implementations).
I wanted to check in with the scipy community on two questions:
1. Would scipy be bothered by numpy adding this?
I don’t think there’s a conflict: scipy.signal.correlate already diverges from np.correlate in meaningful ways (FFT support, mode=‘full’ default as opposed to numpy’s mode=‘valid’ default, n-dimensional inputs). But I’d rather surface any concerns now than after merge.
2. Would scipy want matching parameters on scipy.signal.correlate and scipy.signal.convolve?
For method=‘direct’, the adjustment necessary to appropriately accept the new function arguments would be natural: that path calls np.correlate directly, so passing maxlag/lags through would immediately get the full performance benefit with minimal code change.
For method=’fft’, the calculation could be left unchanged, and the result of the calculation could be sliced appropriately according to the custom lag arguments before returning. The performance improvements are less essential for the fft method as it is already much more efficient for long arrays.
For either method, the changes necessary are only a few lines of code.
I’m happy to put up a small scipy PR that updates the function signatures as described above if that’s useful — I just don’t want to submit unsolicited work that conflicts with scipy’s plans or preferred API direction.
In any case, let me know any thoughts you have on including this functionality in numpy and how that would affect scipy.
Thanks,
Honi