Iterative proportional fitting (IPF)

harisbal · December 14, 2024, 10:57am

Hello,
I am trying to understand whether it makes sense to develop the Iterative Proportional Fitting (IPF) algorithm in scipy.

Reading from here:
“Iterative Proportional Fitting IPF is a technique to find a matrix X that is closest to another matrix Z subject to the constraint that the row and column marginals of X be (nearly) identical to a target matrix Y”.
IPF can be used for matrices of higher dimensions, a use case is presented here.

There are multiple libraries in python that solve the IPF problem but most of them are not actively maintained. I was thinking that including IPF in scipy would deal with this fragmentation and make the algorithm accessible to a wider audience.

My question is whether such an algorithm could become part of scipy or whether it is deemed as too niche for integration.

Thanks a lot

harisbal · January 6, 2025, 11:24am

Just an attempt to revive the discussion.
Cheers

tscoleman · February 5, 2025, 11:36pm

I would strongly advocate for IPF in scipy. New work ( Geenens, Gery. 2020. “Copula Modeling for Discrete Random Vectors.” Dependence Modeling 8 (1): 417–40. Copula modeling for discrete random vectors) places IPF at the center of calculating discrete copulas - modeling correlation and dependence for discrete distributions. I think his work is an important innovation which can lead to a variety of new statistical measures, all based on IPF to calculate the discrete copula. (I have been working on this area for the past few months - a set of notes at https://papers.ssrn.com/abstract=5057549 - old version and I will update in a few days. To date I have used R package ipfr - having IPF in scipy would be a big plus.)

tupui · February 5, 2025, 11:49pm

Thanks for the proposal. Copula was rejected a few years back (I had proposed it) and the work was done instead in Statsmodels. Based on that, I would think that it might be a better fit there.

rkern · February 6, 2025, 1:50am

FWIW, I was the major objector in that case, and my objections then don’t apply here. I’d be fine with having an IPF implementation in scipy.stats if it isn’t already conveniently implemented in the Python ecosystem somewhere. If statsmodels actively wants to take it on instead, also great, but I do think it’s somewhat more of a lower-level building block that would fit comfortably in the scope of scipy.stats just as well. Implementing discrete copulas are one application, but not the only one.

tscoleman · February 6, 2025, 3:49am

Yes, I agree with rkern. IPF is a low-level building block necessary for building discrete copulas but not tied to copulas alone and much more widely applicable. Also, I should note that anything related to copulas a few years back was probably for continuous variables, and continuous copulas (based on Sklar’s theorem) are a very different kettle of fish from the new work for discrete variables (Geenens is the only work I’ve seen on this, and only recently). As for scipy vs stasmodels, it seems like IPF is a little more linear algebra than stats-specific so maybe more appropriate for scipy, but I don’t think it matters very much.

harisbal · February 6, 2025, 9:01am

I am very glad there is interest in implementing IPF. I will start working on a pull request and will capture your feedback there. Thanks!

tscoleman · February 6, 2025, 1:11pm

Thank you all for your attention to the issue.

mdhaber · February 9, 2025, 6:47am

@harisbal please remember that you’ll need to find a reviewer. Also, I see you have contributed to Dirguis/ipfn, which has a fair number of stars, but has not been updated recently. Have you considered reviving that project?

harisbal · February 9, 2025, 1:43pm

Thank you for reminding me this @mdhaber. Is there a formal way to ask for reviewing before the pull request?
The Dirguis/ipfn repo is stale and no new pull requests seem to be considered. I could try to revive the project but I still find value in making IPF available in scipy .

mdhaber · February 9, 2025, 4:49pm

Is there a formal way to ask for reviewing before the pull request?

This is the right place, but there is no formal way. You could also open an enhancement request issue.

josef-pkt · February 10, 2025, 2:08pm

I have not looked at this or related topics in some time, and don’t know how it fits in scipy.

statsmodels has an open issue ENH: tools: closest matrix with given marginals · Issue #7444 · statsmodels/statsmodels · GitHub
and similar helper functions for nearest covariance or correlation matrix, which started out mainly to impose positive or semi-positive definiteness.
Statistics stats - statsmodels 0.15.0 (+605)

As part of copulas we merged a helper function for matrix with uniform marginals.

github.com

statsmodels/statsmodels/blob/main/statsmodels/distributions/tools.py#L155


      
              if coords is not None:
                  dx = np.array(1)
                  for d in range(k_dim):
                      dx = dx[..., None] * np.diff(coords[d])
          
                  p = p * dx
          
              return p
          
          
          def nearest_matrix_margins(mat, maxiter=100, tol=1e-8):
              """nearest matrix with uniform margins
          
              Parameters
              ----------
              mat : array_like, 2-D
                  Matrix that will be converted to have uniform margins.
                  Currently, `mat` has to be two dimensional.
              maxiter : in
                  Maximum number of iterations.
              tol : float

So, it would fit into statsmodels, but I would have to read up again to remember the context.

Josef

tscoleman · April 12, 2025, 3:08pm

I am the one who advocated (on Feb 5) for an implementation of IPF for scipy, but I think I missed an existing implementation in numpy in the project ‘ipfn’: ipfn · PyPI (and see GitHub - Dirguis/ipfn: Iterative Proportional Fitting for Python with N dimensions)

I have not tested this but from reading the web pages it appears the developers have tested against an R implementation in the package ‘ipfp’.

Updated: I have briefly tested against R package mipfp function Ipfp and it gives very close to the same (matching to roughly 7 decimals for a 4x4 probability matrix with row and column targets all 0.25)

Bottom line: the request for IPF in scipy may not be necessary.