RFC: Deprecating `scipy.odr`

For some time now, the functionality of scipy.odr (orthogonal distance regression, nonlinear regression models accounting for error in both input and output variables) has been available in a more modern package outside of scipy thanks to Hugo Vale. I’ve used it in anger, and it really is better than what we have in scipy.odr. This external packaging solves a lot of problems for us maintaining scipy.

  • The underlying library has been updated to ODRPACK95, which incorporates bound constraints, an occasional request
  • The interface is more modern like the other optimizers in scipy.optimize
  • The binding to the Fortran code is not ancient, buggy, idiosyncratic hand-wrapping
  • We don’t need to maintain the Fortran code, which we are trying to eliminate from scipy

Some history: What became scipy.odr was one of my first open source contributions, about 25 years ago, in the dark ages before scipy existed, or f2py, for that matter. The FORTRAN code that it wrapped was about 20 years old at the time, as well. It’s not the most ancient FORTRAN code that we have, but it’s close. I made a lot of silly choices in the wrapping that were barely justifiable at the time, and the related parts of scipy made different, better choices as it evolved. scipy.odr was never used that much to make it worth while to keep up with those choices.

We are currently in the middle of an effort to eliminate Fortran libraries from our codebase, mostly through some heroic effort in translating the code to other languages that we can support better. Given the relatively minor importance of scipy.odr and the much better wrapping now available outside of scipy, instead of adding this library to the pile for translation, several of us are proposing to deprecate scipy.odr entirely and point users to Hugo’s package instead. I myself don’t have a firm proposed timeline, but we can probably visibly deprecate the package as a whole on the next feature release and leave the final removal undefined for now. I expect when the rest of the Fortran purge is complete, and scipy.odr is one of the few remaining would be a good time to pull that trigger.

One occasional request/proposal that comes up with scipy.odr is taking the core algorithmic ideas and reimplementing them in pure Python using the components that we have in scipy.optimize and scipy.linalg. The core idea in ODRPACK is relatively simple (the linear least-squares problem at the heart of the iterations of most outer nonlinear algorithms can take advantage of the special structure of the ODR problem), and this idea can mix-and-match with several different components that we have available. Such a reimplementation would open up the opportunities of the more modern optimizers that we have rather than the one ossified in ODRPACK. I do think this idea does have some merit still, but no one so far really seems to have the bandwidth to work on it. If someone were to take a good stab at that, aimed at getting the implementation into scipy.optimize, following modern conventions, I think we’d be accept such a contribution. I don’t think we’d resurrect scipy.odr and its oddball API, though. The FORTRAN implementation is not the only reason it’s on the chopping block.

Let us know what you think.

6 Likes

That’s an interesting piece of history—thanks for sharing, Robert.

I’m curious about the proposal in the last paragraph: it sounds useful. Is this essentially a single function / class that needs to be implemented? I’m curious what level of expertise would be required, and if this is the type of project we could shop around to an applied math / statistics student, e.g.

It’s probably not quite as trivial as that. I said “mix-and-match”, but that’s a bit glib; conceptually mix-and-match, maybe, but not quite mix-and-matching concrete APIs that we currently have. You would need to verify that the linearized LS sub-problem that ODRPACK utilizes (QR decomposition of the Jacobian of the curve function, not the least-squares loss function!) is the same one that the various nonlinear optimizers use (or else derive the appropriate decomposition for the linearized sub-problem that the outer algorithm you’re targeting uses). I don’t think any of our current concrete APIs expose swapping out this step in any way; this is probably the one time you’d ever want to!

I’d probably say that for the expertise required, it’s below adding a new minimizer algorithm to scipy.optimize, but it’s in that ballpark. I’m not in a hurry to solicit effort (I’d probably rather have another minimizer), just throwing it out there in case it does appeal to anyone naturally.

1 Like

Sounds reasonable, thanks for proposing. Do you think we should follow our usual 2 releases (1 year) and then remove or do think it should be a longer depreciation period?

I don’t think it needs to be any longer than usual, but I also don’t think there is a rush. Our usual minimums apply, but final removal could wait until it’s really causing a problem/blocking an opportunity.

We can deprecate now and use that extended time for giving more slack to folks. Currently the ODEPACK is about to be done, just pushed the second PR that would conclude it. But FITPACK looks like ants walking on my screen so not sure when I will be able to address that one. The list in META: FORTRAN Code inventory · Issue #18566 · scipy/scipy · GitHub has only BLAS/LAPACK wrappers left other than FITPACK which is a bit convoluted but can be done with some automation.

All this is to say, I don’t know when we are going to finalize the translations. So we would have to pull the plug after finishing all and then wait another two versions at a finished state. However if we deprecate now, we can pull the plug when that work is concluded.

But FITPACK looks like ants walking on my screen

FITPACK conversion is actually pretty far along. The 1D conversion is done for scipy 1.15, and Gagandeep Singh has just sent a PR with his translation of RectBivariateSpline (ENH: Python based bivariate spline fitter (``regrid_python``) — Fortran-free alternative to ``RectBivariateSpline`` by czgdp1807 · Pull Request #23962 · scipy/scipy · GitHub). The largest routine that’s left is surfit.f.

One thing we did not converge on for FITPACK is how to pull the plug (that was always for “some far future“).

So far, instead of rewriting old routines with the identical API, we provided functional analogs and declared the FITPACK routines as “legacy”. E.g. splrep and UnivariateSpline are legacy, and the recommended replacement is `make_splrep`.

Now that we can actually plan the removal, the options are:

  • make old APIs thin wrappers for over the new routines
  • hard remove the old routines and add ample docs on how to port from old to new.

The latter will be quite disruptive indeed. However, if we remove ODR, we could probably remove both ODR and old FITPACK wrappers in the same release (scipy 2.0, likely). Thoughts?

That seems way too disruptive. We need compat shims. ODR and FITPACK aren’t comparable at all, the usage of scipy.interpolateis at least two orders of magnitude higher and more important than that of scipy.odr.

I am +1 on deprecating.

If the API of the new package is different we should offer some migration guidance. Could someone post a minimal comparison for the usage of both versions?

import numpy as np
import scipy.odr
import odrpack

# Classic "Pearson data" that motivates ODR.
# Errors are in both variables, and if you don't account for this,
# doing a linear fit of X vs. Y or Y vs. X will give you quit
# different results.
p_x = np.array([0.,.9,1.8,2.6,3.3,4.4,5.2,6.1,6.5,7.4])
p_y = np.array([5.9,5.4,4.4,4.6,3.5,3.7,2.8,2.8,2.4,1.5])
p_sx = np.array([.03,.03,.04,.035,.07,.11,.13,.22,.74,1.])
p_sy = np.array([1.,.74,.5,.35,.22,.22,.12,.12,.1,.04])

# Old-style
# The RealData class takes care of details like turning
# standard-deviation error bars into weights.
p_dat = scipy.odr.RealData(p_x, p_y, sx=p_sx, sy=p_sy)
# Note, parameters come before `x` in scipy.odr
p_mod = scipy.odr.Model(lambda beta, x: beta[0] + beta[1] * x)
p_odr = scipy.odr.ODR(p_dat, p_mod, beta0=[1., 1.])
old_out = p_odr.run()

# New-style
# Parameters come after data, in the new API.
# We must convert the error bars into weights ourselves.
new_out = odrpack.odr_fit(lambda x, beta: beta[0] + beta[1] * x,
    p_x, p_y, beta0=np.array([1.0, 1.0]),
    weight_x=p_sx**-2, weight_y=p_sy**-2)

assert np.isclose(old_out.beta, new_out.beta).all()

Regarding the timeline, I was daydreaming about BLAS/LAPACK wrappers in my comment above and Ralf slapped some sense into me (thanks again). So we are now only looking at FITPACK as the final boss.

As much as I want to push FITPACK over the line into 1.17.0 , I also have some concerns about the next morning after release. Typically we only hear about my translation mistakes after the release hit the public. I am getting better at it but it still happens. By the way, my sincere gratitude to the downstream library maintainers who put up with my silly mistakes discovered in the nightlies.

In this release, we already have high profile libraries like PROPACK, ARPACK, LSODA and all the integration routines translated. So I’m not sure if I can keep up with the bug reports, should we receive 6-7 bug reports immediately. I am aware that it is an artifical pressure but when it happens, it’s like doing 10 more pushups when you come home from the gym. It somehow feels harder than the entire translation. Hence, FITPACK might spill over to 1.18 release. Moreover, I really had enough of this “work” :slight_smile:, it’s getting harder to look at the JurassicPACK code. Also I am traveling less for ${actual work} which was my focus time that I don’t have much recently.

So I would propose to deprecate ODRPACK the upcoming release 1.17.0 right away. That means we are looking at fortran-free SciPy 1.19.0 or 1.20.0 release. Would that be a sufficient enough time for deprecation scipy.odr, what say you?

The GitHub search returns a number of not-so-small repos using scipy.odr Code search results · GitHub so probably we should reach out to them within our capacity.

1 Like

That’s probably a reasonable schedule. When we have a deprecation PR, I can start reaching out to those repos. There aren’t too many that appear to be going concerns.

1 Like

Seems like a reasonable schedule to me too.

1 Like

We have roughly two to four weeks left for 1.17. Any pointers in how to deprecate a whole module? Similar to the process with `scipy.misc` ?

That looks like a good example, yes.

PR open DEP: deprecate `scipy.odr` by j-bowhay · Pull Request #24033 · scipy/scipy · GitHub

3 Likes

Now that the deprecation has happened, do we have any interest in deprecating linalg.interpolative so that we can switch to a more consistent and thorough randomized linear algebra routines?

This has been discussed briefly before in Add LSRN solver and the references therein. But I am not sure if there is any appetite for it. The reasoning is similar to what @rkern gave above. But in this particular case, it seems like the Fortran code forced our hand to have the suboptimal API in that shape.

Also in hindsight, randomized linalg moved on to become even more powerful and established since then. So I just wanted to entertain the idea since this might be the good time to do it if we want to do it, instead of piecemeal deprecations and creating an impression that every version we are removing some stuff randomly.

It’d be better for this to be its own thread, or continue the old one. I’d be open to deprecating linalg.interpolative, the question is what we replace it with - we need the new thing first.

Yes, we need the new thing and we do have a few options the most obvious one is GitHub - BallisticLA/RandBLAS: A header-only C++ library for sketching in randomized linear algebra and RandLAPACK both in C++ but this was more about whether we should squeeze the deprecation now or do the new thing and then do another deprecation later.

Module level deprecations are a big deal so I was wondering whether we should decide now to do both together or not. Hows and whats can be done in a separate thread.

The latter. The deprecation should say “use this instead”, which we can only do once a replacement is ready.

1 Like