Parallelizing calls to SciPy's RegularGridInterpolator using Dask

rringuette · July 14, 2023, 10:19pm

Hi! We are working on interpolating through large datasets and are interested in working out a way to use Dask to decrease computation time. In the simplest case, we would like to call in parallel a function that reads in a time-slice of the data, creates a SciPy regular grid interpolator with that data, and then interpolates through the data. Assume the dataset is too large to read in all at once even for a single variable.
My previous attempts at this wound up in MPI trying to serialize a function and got stuck there, so I switched to using a Xarray+dask approach that wound up slower than my current solution (not parallelized). My intuition is that this should be simple with dask, but I need to avoid using Xarray’s interpolation call because they have an embedded numpy.meshgrid command somewhere making this too memory intensive (and iterative calls for point-by-point are prohibitively slow).
Does anyone have some ideas or know of resources to tackle this? I would really like to burst into the cloud with this.
related links: https://github.com/nasa/Kamodo

stefanv · February 2, 2024, 8:18pm

Hi @rringuette, and sorry that this never got a response

SciPy has some relatively new functionality to use only the N nearest datapoints when doing Radial Basis Function interpolation. Perhaps that would be helpful?