statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the multivariate ''t''-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

s of the Student's ''t''-distribution, which is a distribution applicable to univariate

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s. While the case of a

random matrix In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all of its entries are sampled randomly from a probability distribution. Random matrix theory (RMT) is the ...

could be treated within this structure, the matrix ''t''-distribution is distinct and makes particular use of the matrix structure.

Definition

One common method of construction of a multivariate ''t''-distribution, for the case of

p

dimensions, is based on the observation that if

\mathbf y

and

u

are independent and distributed as

N(,)

and

\chi^2_\nu

(i.e. multivariate normal and

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

s) respectively, the matrix

\mathbf\,

is a ''p'' × ''p'' matrix, and

is a constant vector then the random variable

=/\sqrt +

has the density :

\frac\left +\frac(-)^T^(-)\right

and is said to be distributed as a multivariate ''t''-distribution with parameters

,,\nu

. Note that

\mathbf\Sigma

is not the covariance matrix since the covariance is given by

\nu/(\nu-2)\mathbf\Sigma

(for

\nu>2

). The constructive definition of a multivariate ''t''-distribution simultaneously serves as a sampling algorithm: # Generate

u \sim \chi^2_\nu

and

\mathbf \sim N(\mathbf, \boldsymbol)

, independently. # Compute

\mathbf \gets \mathbf\sqrt+ \boldsymbol

. This formulation gives rise to the hierarchical representation of a multivariate ''t''-distribution as a scale-mixture of normals:

u \sim \mathrm(\nu/2,\nu/2)

where

\mathrm(a,b)

indicates a gamma distribution with density proportional to

x^e^

, and

\mathbf\mid u

conditionally follows

N(\boldsymbol,u^\boldsymbol)

. In the special case

\nu=1

, the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's ''t''-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (

p=1

), with

t=x-\mu

and

\Sigma=1

, we have the

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

f(t) = \frac (1+t^2/\nu)^

and one approach is to use a corresponding function of several variables. This is the basic idea of

elliptical distribution In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. In the simplified two and three dimensional case, the joint distribution f ...

theory, where one writes down a corresponding function of

p

variables

t_i

that replaces

t^2

by a quadratic function of all the

t_i

. It is clear that this only makes sense when all the marginal distributions have the same

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

\nu

. With

\mathbf = \boldsymbol\Sigma^

, one has a simple choice of multivariate density function :

f(\mathbf t) = \frac \left(1+\sum_^ A_ t_i t_j/\nu\right)^

which is the standard but not the only choice. An important special case is the standard bivariate ''t''-distribution, ''p'' = 2: :

f(t_1,t_2) = \frac \left(1+\sum_^ A_ t_i t_j/\nu\right)^

Note that

\frac= \frac

. Now, if

\mathbf

is the identity matrix, the density is :

f(t_1,t_2) = \frac \left(1+(t_1^2 + t_2^2)/\nu\right)^.

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When

\Sigma

is diagonal the standard representation can be shown to have zero

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

but the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...

s are not

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...

. A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.

Cumulative distribution function

The definition of the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

(cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here

\mathbf

is a real vector): :

F(\mathbf) = \mathbb(\mathbf\leq \mathbf), \quad \textrm\;\; \mathbf\sim t_\nu(\boldsymbol\mu,\boldsymbol\Sigma).

There is no simple formula for

F(\mathbf)

, but it can b
approximated numerically
via Monte Carlo integration.

Conditional Distribution

This was developed by Muirhead and Cornish. but later derived using the simpler chi-squared ratio representation above, by Roth and Ding. Let vector

X

follow a multivariate ''t'' distribution and partition into two subvectors of

p_1, p_2

elements: :

X_p =  \begin
     X_1  \\
     X_2  \end \sim t_p \left (\mu_p, \Sigma_, \nu \right )

where

p_1 + p_2 = p

, the known mean vectors are

\mu_p =  \begin
     \mu_1  \\
     \mu_2  \end

and the scale matrix is

\Sigma_ = \begin
     \Sigma_ & \Sigma_ \\
     \Sigma_  & \Sigma_ \end

. Roth and Ding find the conditional distribution

p(X_1, X_2)

to be a new ''t''-distribution with modified parameters. :

X_1, X_2 \sim t_\left( \mu_,\frac \Sigma_, \nu + p_2  \right)

An equivalent expression in Kotz et. al. is somewhat less concise. Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution

X_1, X_2 \sim t_\left( \mu_, \Psi ,\tilde  \right)

above then, using the parameters below, the explicit conditional distribution becomes :

f(X_1, X_2) =\frac\left +\frac( X_1 - \mu_ )^T^(X_1- \mu_ )\right

where :

\tilde \nu = \nu + p_2

Effective degrees of freedom,

\nu

is augmented by the number of disused variables

p_2

. :

\mu_ =  \mu_1 + \Sigma_ \Sigma_^ \left(X_2 - \mu_2 \right )

is the conditional mean of

x_1

\Sigma_ = \Sigma_ - \Sigma_ \Sigma_ ^ \Sigma_

is the Schur complement of

\Sigma_ \text \Sigma

. :

d_2 = (X_2 - \mu_2)^T \Sigma_^ (X_2 - \mu_2)

is the squared

Mahalanobis distance The Mahalanobis distance is a distance measure, measure of the distance between a point P and a probability distribution D, introduced by Prasanta Chandra Mahalanobis, P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance ...

X_2

from

\mu_2

with scale matrix

\Sigma_

\Psi = \frac \Sigma_

is the conditional scale matrix for

\tilde > 2

Copulas based on the multivariate ''t''

The use of such distributions is enjoying renewed interest due to applications in

mathematical finance Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling in the financial field. In general, there exist two separate branches of finance that req ...

, especially through the use of the Student's ''t'' copula.

Elliptical representation

Constructed as an

, take the simplest centralised case with spherical symmetry and no scaling,

\Sigma = \operatorname \,

, then the multivariate ''t''-PDF takes the form :

f_X(X)= g(X^T X) = \frac \bigg( 1 + \nu^ X^T X \bigg)^

where

X =(x_1, \cdots ,x_p )^T\text  p\text

and

\nu

= degrees of freedom as defined in Muirhead section 1.5. The covariance of

X

is :

\operatorname \left( XX^T \right) = \int_^\infty \cdots \int_^\infty f_X(x_1,\dots, x_p) XX^T \, dx_1 \dots dx_p = \frac \operatorname

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder, define radial measure

r_2 = R^2 = \frac

and, noting that the density is dependent only on r₂, we get

$= \int_^\infty \cdots \int_^\infty f_X(x_1,\dots, x_p) \frac \, dx_1 \dots dx_p = \frac$

which is equivalent to the variance of

p

-element vector

X

treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

Radial Distribution

r_2  = \frac

follows the Fisher-Snedecor or

F

distribution: :

r_2 \sim f_( p,\nu) = B \bigg( \frac , \frac  \bigg ) ^ \bigg (\frac \bigg )^ r_2^  
 \bigg( 1 + \frac r_2 \bigg) ^

having mean value

= \frac

F

-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation. By a change of random variable to

y =  \frac  r_2 = \frac

in the equation above, retaining

p

-vector

X

, we have

= \int_^\infty \cdots \int_^\infty f_X(X) \frac \, dx_1 \dots dx_p = \frac

and probability distribution :

\begin  f_Y(y,  \,p,\nu) & = \left ,  \frac  \^  B \bigg( \frac , \frac  \bigg )^  \big (\frac \big )^ \big (\frac \big )^ y^   \big( 1 + y \big) ^ \\ \\
                &  = B \bigg ( \frac , \frac  \bigg )^ y^(1+ y )^  \end

which is a regular Beta-prime distribution

y \sim \beta \, '  \bigg(y; \frac , \frac  \bigg )

having mean value

\frac  = \frac

Cumulative Radial Distribution

Given the Beta-prime distribution, the radial cumulative distribution function of

y

is known: :

F_Y(y) \sim I \,   \bigg(\frac ; \, \frac , \frac  \bigg ) B\bigg( \frac , \frac  \bigg )^

where

I

is the incomplete

Beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

and applies with a spherical

\Sigma

assumption. In the scalar case,

p  = 1

, the distribution is equivalent to Student-''t'' with the equivalence

t^2 = y^2  \sigma^

, the variable ''t'' having double-sided tails for CDF purposes, i.e. the "two-tail-t-test". The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at

R = (X^TX)^

with PDF

p_X(X)  \propto \bigg( 1 + \nu^ R^2 \bigg)^

is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area

A_R

and thickness

\delta R

R

\delta P = p_X(R) \, A_R \delta R

. The enclosed

p

-sphere of radius

R

has surface area

A_R = \frac

. Substitution into

\delta P

shows that the shell has element of probability

\delta P = p_X(R) \frac  \delta R

which is equivalent to radial density function :

f_R(R) =  \frac  \frac  \bigg( 1 + \frac \bigg)^

which further simplifies to

f_R(R) =   \frac  \bigg( \frac  \bigg)^   \bigg( 1 + \frac \bigg)^

where

B(*,*)

is the

. Changing the radial variable to

y=R^2 / \nu

returns the previous Beta Prime distribution :

f_Y(y) =  \frac   y^   \bigg( 1 + y \bigg)^

To scale the radial variables without changing the radial shape function, define scale matrix

\Sigma = \alpha \operatorname

, yielding a 3-parameter Cartesian density function, ie. the probability

\Delta_P

in volume element

dx_1 \dots dx_p

is :

\Delta_P \big (f_X(X \,, \alpha, p, \nu) \big ) = \frac \bigg( 1 +  \frac \bigg)^ \; dx_1 \dots dx_p

or, in terms of scalar radial variable

R

, :

f_R(R \,, \alpha, p, \nu) =   \frac  \bigg( \frac  \bigg)^   \bigg( 1 + \frac \bigg)^

Radial Moments

The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If

Z \sim \beta'(a,b)

then

\operatorname (Z^m) =

, a known result. Thus, for variable

y = \frac  R^2

we have :

\operatorname (y^m) =  = \frac, \; \nu/2 > m

The moments of

r_2 = \nu \, y

are :

\operatorname (r_2^m) = \nu^m\operatorname (y^m)

while introducing the scale matrix

\alpha \operatorname

yields :

\operatorname (r_2^m ,  \alpha) = \alpha^m \nu^m \operatorname (y^m)

Moments relating to radial variable

R

are found by setting

R =(\alpha\nu y)^

and

M=2m

whereupon :

\operatorname (R^M ) =\operatorname \big((\alpha \nu y)^ \big)^ = (\alpha \nu )^ \operatorname (y^)= (\alpha \nu )^

Linear Combinations and Affine Transformation

Full Rank Transform

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf:

f_X(X) = \frac  \left( 1+ \nu^ X^T \Sigma^ X \right) ^

, where

\Kappa

is a constant and

\nu

is arbitrary but fixed, let

\Theta 
 \in  \mathbb^

be a full-rank matrix and form vector

Y = \Theta X

. Then, by straightforward change of variables :

f_Y(Y) = \frac  \left( 1+ \nu^Y^T \Theta^ \Sigma^ \Theta^ Y \right) ^  \left,  \frac \ ^

The matrix of partial derivatives is

\frac  = \Theta_

and the Jacobian becomes

\left,  \frac \  = \left,  \Theta  \

. Thus :

f_Y(Y) = \frac   \left( 1 + \nu^ Y^T \Theta^ \Sigma^ \Theta^ Y \right) ^

The denominator reduces to :

\left, \Sigma \^ \left,  \Theta  \ = \left, \Sigma \^ \left,  \Theta \^ \left, \Theta^T  \^ =  \left,  \Theta \Sigma \Theta^T \^

In full: :

f_Y(Y) = \frac   \left( 1 + \nu^ Y^T \left( \Theta \Sigma \Theta^T \right) ^ Y \right) ^

which is a regular MV-''t'' distribution. In general if

X \sim t_p ( \mu, \Sigma, \nu )

and

\Theta^

has full rank

p

then :

\Theta X + c \sim t_p( \Theta \mu +c, \Theta \Sigma \Theta^T, \nu  )

Marginal Distributions

This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition

X  \sim t (p,  \mu,  \Sigma, \nu  )

into two subvectors of

p_1, p_2

elements: :

X_p =  \begin
     X_1  \\
     X_2  \end \sim t \left ( p_1 + p_2, \mu_p, \Sigma_, \nu \right )

with

p_1 + p_2 = p

, means

\mu_p =  \begin
     \mu_1  \\
     \mu_2  \end

, scale matrix

\Sigma_ = \begin
     \Sigma_ & \Sigma_ \\
     \Sigma_  & \Sigma_ \end

then

X_1  \sim t \left ( p_1, \mu_1, \Sigma_, \nu \right )

X_2  \sim t \left ( p_2, \mu_2, \Sigma_, \nu \right )

such that :

f(X_1) = 
\frac\left +\frac(-)^T_^(-)\right

f(X_2) = 
\frac\left +\frac( - )^T_^(-)\right

If a transformation is constructed in the form :

\Theta_ = \begin
     1 &  \cdots  &  0 & \cdots & 0 \\
     0 &  \ddots  &  0 & \cdots & 0 \\
     0  & \cdots &  1 & \cdots & 0 \end

then vector

Y = \Theta X

, as discussed below, has the same distribution as the marginal distribution of

X_1

Rank-Reducing Linear Transform

In the linear transform case, if

\Theta

is a rectangular matrix

\Theta \in  \mathbb^, m < p

, of rank

m

the result is dimensionality reduction. Here, Jacobian

\left,  \Theta \

is seemingly rectangular but the value

\left,   \Theta \Sigma \Theta^T \^

in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken. In general if

X \sim t (p, \mu, \Sigma, \nu )

and

\Theta^

has full rank

m

then :

Y = \Theta X + c \sim t ( m, \Theta \mu + c, \Theta \Sigma \Theta^T, \nu )

, \; c_1 = \Theta \mu + c

''In extremis'', if ''m'' = 1 and

\Theta

becomes a row vector, then scalar ''Y'' follows a univariate double-sided Student-t distribution defined by

t^2 = Y^2 / \sigma^2

with the same

\nu

degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-''t''. * During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector

Z

whose elements remain 'entangled' and are not statistically independent. * A vector of independent student-''t'' samples is not consistent with the multivariate ''t'' distribution. * Adding two sample multivariate ''t'' vectors generated with independent Chi-squared samples and different

\nu

values:

/\sqrt, \; \; /\sqrt

will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem. * Taleb compares many examples of fat-tail elliptical ''vs'' non-elliptical multivariate distributions

Related concepts

* In univariate statistics, the Student's ''t''-test makes use of Student's ''t''-distribution * The elliptical multivariate-''t'' distribution arises spontaneously in linearly constrained least squares solutions involving multivariate normal source data, for example the Markowitz global minimum variance solution in financial portfolio analysis. which addresses an ensemble of normal random vectors or a random matrix. It does not arise in ordinary least squares (OLS) or multiple regression with fixed dependent and independent variables which problem tends to produce well-behaved normal error probabilities. * Hotelling's ''T''-squared distribution is a distribution that arises in multivariate statistics. * The matrix ''t''-distribution is a distribution for random variables arranged in a matrix structure.

References

Literature

* * *

External links

Copula Methods vs Canonical Multivariate Distributions: the multivariate Student T distribution with general degrees of freedom
{{DEFAULTSORT:Multivariate Normal Distribution Continuous distributions Multivariate continuous distributions

Definition

Derivation

Cumulative distribution function

Conditional Distribution

Copulas based on the multivariate ''t''

Elliptical representation

Radial Distribution

Cumulative Radial Distribution

Radial Moments

Linear Combinations and Affine Transformation

Full Rank Transform

Marginal Distributions

Rank-Reducing Linear Transform

Related concepts

See also

References

Literature

External links