Hi! We are working on interpolating through large datasets and are interested in working out a way to use Dask to decrease computation time. In the simplest case, we would like to call in parallel a function that reads in a time-slice of the data, creates a SciPy regular grid interpolator with that data, and then interpolates through the data. Assume the dataset is too large to read in all at once even for a single variable.

My previous attempts at this wound up in MPI trying to serialize a function and got stuck there, so I switched to using a Xarray+dask approach that wound up slower than my current solution (not parallelized). My intuition is that this should be simple with dask, but I need to avoid using Xarray’s interpolation call because they have an embedded numpy.meshgrid command somewhere making this too memory intensive (and iterative calls for point-by-point are prohibitively slow).

Does anyone have some ideas or know of resources to tackle this? I would really like to burst into the cloud with this.

related links: https://github.com/nasa/Kamodo

Hi @rringuette, and sorry that this never got a response

SciPy has some relatively new functionality to use only the N nearest datapoints when doing Radial Basis Function interpolation. Perhaps that would be helpful?

1 Like