In
statistics, particularly in
hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
, the Hotelling's ''T''-squared distribution (''T''
2), proposed by
Harold Hotelling
Harold Hotelling (; September 29, 1895 – December 26, 1973) was an American mathematical statistician and an influential economic theorist, known for Hotelling's law, Hotelling's lemma, and Hotelling's rule in economics, as well as Hotelling's ...
,
is a
multivariate probability distribution that is tightly related to the
''F''-distribution and is most notable for arising as the distribution of a set of
sample statistics
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hyp ...
that are natural generalizations of the statistics underlying the
Student's ''t''-distribution.
The Hotelling's ''t''-squared statistic (''t''
2) is a generalization of
Student's ''t''-statistic that is used in
multivariate
Multivariate may refer to:
In mathematics
* Multivariable calculus
* Multivariate function
* Multivariate polynomial
In computing
* Multivariate cryptography
* Multivariate division algorithm
* Multivariate interpolation
* Multivariate optical c ...
hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
.
Motivation
The distribution arises in
multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
in undertaking
tests
Test(s), testing, or TEST may refer to:
* Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities
Arts and entertainment
* ''Test'' (2013 film), an American film
* ''Test'' (2014 film), ...
of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a
''t''-test.
The distribution is named for
Harold Hotelling
Harold Hotelling (; September 29, 1895 – December 26, 1973) was an American mathematical statistician and an influential economic theorist, known for Hotelling's law, Hotelling's lemma, and Hotelling's rule in economics, as well as Hotelling's ...
, who developed it as a generalization of Student's ''t''-distribution.
Definition
If the vector
is
Gaussian multivariate-distributed with zero mean and unit
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...
and
is a
matrix with unit
scale matrix and ''m''
degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
with a
Wishart distribution
In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.
It is a family of probability distributions defin ...
, then the
quadratic form
In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example,
:4x^2 + 2xy - 3y^2
is a quadratic form in the variables and . The coefficients usually belong to ...
has a Hotelling distribution (with parameters
and
):
:
Furthermore, if a random variable ''X'' has Hotelling's ''T''-squared distribution,
, then:
[
:
where is the ''F''-distribution with parameters ''p'' and ''m−p+1''.
]
Hotelling ''t''-squared statistic
Let be the sample covariance
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables.
The sample mean is the average value (or mean value) of a sample of numbers taken from a larger po ...
:
:
where we denote transpose
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations).
The tr ...
by an apostrophe
The apostrophe ( or ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes:
* The marking of the omission of one ...
. It can be shown that is a positive (semi) definite matrix and follows a ''p''-variate Wishart distribution
In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.
It is a family of probability distributions defin ...
with ''n''−1 degrees of freedom.
The sample covariance matrix of the mean reads .
The Hotelling's ''t''-squared statistic is then defined as:
:
which is proportional to the distance
Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...
between the sample mean and . Because of this, one should expect the statistic to assume low values if , and high values if they are different.
From the distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
,
:
where is the ''F''-distribution with parameters ''p'' and ''n'' − ''p''.
In order to calculate a ''p''-value (unrelated to ''p'' variable here), note that the distribution of equivalently implies that
:
Then, use the quantity on the left hand side to evaluate the ''p''-value corresponding to the sample, which comes from the ''F''-distribution. A confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
may also be determined using similar logic.
Motivation
Let denote a ''p''-variate normal distribution with location
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
and known covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
. Let
:
be ''n'' independent identically distributed (iid) random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s, which may be represented as column vectors of real numbers. Define
:
to be the sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables.
The sample mean is the average value (or mean value) of a sample of numbers taken from a larger po ...
with covariance . It can be shown that
:
where is the chi-squared distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...
with ''p'' degrees of freedom.
Two-sample statistic
If and , with the samples independently drawn from two independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
s with the same mean and covariance, and we define
:
as the sample means, and
:
:
as the respective sample covariance matrices. Then
:
is the unbiased pooled covariance matrix estimate (an extension of pooled variance
In statistics, pooled variance (also known as combined variance, composite variance, or overall variance, and written \sigma^2) is a method for estimating variance of several different populations when the mean of each population may be differen ...
).
Finally, the Hotelling's two-sample ''t''-squared statistic is
:
Related concepts
It can be related to the F-distribution by
:
The non-null distribution of this statistic is the noncentral F-distribution
In probability theory and statistics, the noncentral ''F''-distribution is a continuous probability distribution that is a noncentral generalization of the (ordinary) ''F''-distribution. It describes the distribution of the quotient (''X''/''n' ...
(the ratio of a non-central Chi-squared random variable and an independent central Chi-squared random variable)
:
with
:
where is the difference vector between the population means.
In the two-variable case, the formula simplifies nicely allowing appreciation of how the correlation, ,
between the variables affects . If we define
:
and
:
then
:
Thus, if the differences in the two rows of the vector are of the same sign, in general, becomes smaller as becomes more positive. If the differences are of opposite sign becomes larger as becomes more positive.
A univariate special case can be found in Welch's t-test
In statistics, Welch's ''t''-test, or unequal variances ''t''-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of ...
.
More robust and powerful tests than Hotelling's two-sample test have been proposed in the literature, see for example the interpoint distance based tests which can be applied also when the number of variables is comparable with, or even larger than, the number of subjects.
See also
* Student's ''t''-test in univariate statistics
* Student's ''t''-distribution in univariate probability theory
* Multivariate Student distribution
In statistics, the multivariate ''t''-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's ''t''-distribution, which is a distribution applic ...
* ''F''-distribution (commonly tabulated or available in software libraries, and hence used for testing the ''T''-squared statistic using the relationship given above)
* Wilks's lambda distribution (in multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, Wilks's ''Λ'' is to Hotelling's ''T''2 as Snedecor's ''F'' is to Student's ''t'' in univariate statistics)
References
External links
*
{{DEFAULTSORT:Hotelling's T-Squared Distribution
Continuous distributions