Feature Addition: Adding matrix t-distribution to `scipy.stats`

Greetings!

I wrote a scipy-compatible implementation of the matrix t-distribution for a recent project, where we needed to sample predictions from a Bayesian multivariate linear regression model. My implementation has methods analogous to the ones currently offered by stats.matrix_normal.

I’m looking to gauge the community’s interest before performing further developmental tasks (local build, unit tests, benchmarking, …) per the contributor guidelines.

Some background:

  • The matrix t-distribution is one of the fundamental matrix variate distributions, and is treated in detail in the monograph on matrix variate distributions by Gupta and Nagar (2000).

  • Currently there is no matrix t-distribution offered in SciPy. Besides filling in an omission, this feature would be useful for Bayesian inference in particular: The matrix t-distribution is the posterior predictive distribution of a multivariate linear regression model with a matrix normal likelihood function and a normal-inverse-Wishart conjugate prior.

  • Unlike the matrix normal distribution, the matrix t-distribution can’t be sampled by vectorizing a matrix and sampling the corresponding multivariate t-distribution, because of the way that the among-row covariance matrix and among-column covariance matrix interact in the PDF. This leads to a subtlety when generating random variates, making it possible to accidentally draw from a “mock matrix t-distribution” that has the same mean and covariance as the matrix t-distribution but is not equivalent to the matrix t-distribution.

  • There are matrix t-distribution implementations in R, Mathematica, and Julia.

Implementation notes:

  • The interface of my matrix t-distribution class is similar to the one for matrix_normal. It includes public methods logpdf, pdf, rvs, entropy, __call__, and internal methods _process_parameters, _process_quantiles, _logpdf, _entropy that behave analogously to the ones for matrix_normal.

  • My code naturally fits in stats._multivariate, without the need to change any of the existing code. The ingredients for my code already exist in SciPy, so this contribution would consist of the class defining the matrix t-distribution itself.

  • In terms of mathematical correctness, the outputs of my rvs, pdf, logpdf methods agree with the outputs of Mathematica’s RandomVariate and PDF of a MatrixTDistribution.

  • From my own testing, the rvs method runs slower than scipy.stats.matrix_normal.rvs but comparable to Mathematica’s RandomVariate[MatrixTDistribution[...]]. For example, to generate one million 2x3 random matrices:

    • Proposed matrix_t.rvs takes 13.711 sec.
    • Mathematica’s RandomVariate[MatrixTDistribution[...]] takes about 13.7329 sec using AbsoluteTiming
    • SciPy’s stats.matrix_normal.rvs takes 0.019 sec
    • Mathematica’s RandomVariate[MatrixNormalDistribution[...]] takes about 2.11419 sec using AbsoluteTiming

Besides making the stats module more complete, adding this feature will enable users to construct Bayesian multilinear regression models that support sampling directly from the posterior predictive distribution. My implementation has relatively few lines of code, but it is a substantial contribution, especially in light of the potential pitfalls of generating random variates.

I’d appreciate any feedback or guidance from the community, and am happy to provide further clarification or background info.

Thanks for your time!