In
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
and
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous
probability distributions that arise when estimating the
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ar ...
of a
normally distributed population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction usi ...
in situations where the
sample size
Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populati ...
is small and the population's
standard deviation is unknown. It was developed by English statistician
William Sealy Gosset
William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...
under the pseudonym "Student".
The ''t''-distribution plays a role in a number of widely used statistical analyses, including
Student's ''t''-test for assessing the
statistical significance of the difference between two sample means, the construction of
confidence intervals for the difference between two population means, and in linear
regression analysis. Student's ''t''-distribution also arises in the
Bayesian analysis of data from a normal family.
If we take a sample of
observations from a normal distribution, then the ''t''-distribution with
degrees of freedom can be defined as the distribution of the location of the sample mean relative to the true mean, divided by the sample standard deviation, after multiplying by the standardizing term
. In this way, the ''t''-distribution can be used to construct a
confidence interval for the true mean.
The ''t''-distribution is symmetric and bell-shaped, like the normal distribution. However, the ''t''-distribution has heavier tails, meaning that it is more prone to producing values that fall far from its mean. This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero. The Student's ''t''-distribution is a special case of the
generalized hyperbolic distribution.
History and etymology
In statistics, the ''t''-distribution was first derived as a
posterior distribution in 1876 by
Helmert
Friedrich Robert Helmert (31 July 1843 – 15 June 1917) was a German geodesist and statistician with important contributions to the theory of errors.
Career
Helmert was born in Freiberg, Kingdom of Saxony. After schooling in Freiberg an ...
and
Lüroth.
The ''t''-distribution also appeared in a more general form as
Pearson Type IV distribution in
Karl Pearson's 1895 paper.
In the English-language literature, the distribution takes its name from William Sealy Gosset's 1908 paper in ''
Biometrika'' under the pseudonym "Student". One version of the origin of the pseudonym is that Gosset's employer preferred staff to use pen names when publishing scientific papers instead of their real name, so he used the name "Student" to hide his identity. Another version is that Guinness did not want their competitors to know that they were using the ''t''-test to determine the quality of raw material.
Gosset worked at the
Guinness Brewery in
Dublin, Ireland, and was interested in the problems of small samples – for example, the chemical properties of barley where sample sizes might be as few as 3. Gosset's paper refers to the distribution as the "frequency distribution of standard deviations of samples drawn from a normal population". It became well known through the work of
Ronald Fisher, who called the distribution "Student's distribution" and represented the test value with the letter ''t''.
How Student's distribution arises from sampling
Let
be independently and identically drawn from the distribution
, i.e. this is a sample of size
from a normally distributed population with expected mean value
and variance
.
Let
:
be the sample mean and let
:
be the (
Bessel-corrected) sample variance. Then the random variable
:
has a standard normal distribution (i.e. normal with expected mean 0 and variance 1), and the random variable
:
''i.e'' where
has been substituted for
, has a Student's ''t''-distribution with
degrees of freedom. Since
has replaced
the only unobservable quantity in this expression is
so this can be used to derive confidence intervals for
The numerator and the denominator in the preceding expression are
statistically independent random variables despite being based on the same sample
. This can be seen by observing that
and recalling that
and
are both linear combinations of the same set of i.i.d. normally distributed random variables.
Definition
Probability density function
Student's ''t''-distribution has the
probability density function (PDF) given by
:
where
is the number of ''
degrees of freedom'' and
is the
gamma function
In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers excep ...
. This may also be written as
:
where B is the
Beta function. In particular for integer valued degrees of freedom
we have:
For
even,
:
For
odd,
:
The probability density function is
symmetric, and its overall shape resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the ''t''-distribution approaches the normal distribution with mean 0 and variance 1. For this reason
is also known as the normality parameter.
The following images show the density of the ''t''-distribution for increasing values of
. The normal distribution is shown as a blue line for comparison. Note that the ''t''-distribution (red line) becomes closer to the normal distribution as
increases.
Cumulative distribution function
The
cumulative distribution function (CDF) can be written in terms of ''I'', the regularized
incomplete beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^( ...
. For ''t'' > 0,
[
:
where
:
Other values would be obtained by symmetry. An alternative formula, valid for , is][
:
where 2''F''1 is a particular case of the hypergeometric function.
For information on its inverse cumulative distribution function, see .
]
Special cases
Certain values of give a simple form for Student's t-distribution.
How the ''t''-distribution arises
Sampling distribution
Let be the numbers observed in a sample from a continuously distributed population with expected value . The sample mean and sample variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
are given by:
:
The resulting ''t-value'' is
:
The ''t''-distribution with degrees of freedom is the sampling distribution of the ''t''-value when the samples consist of independent identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
observations from a normally distributed population. Thus for inference purposes ''t'' is a useful " pivotal quantity" in the case when the mean and variance are unknown population parameters, in the sense that the ''t''-value has then a probability distribution that depends on neither nor .
Bayesian inference
In Bayesian statistics, a (scaled, shifted) ''t''-distribution arises as the marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variab ...
of the unknown mean of a normal distribution, when the dependence on an unknown variance has been marginalized out:
:
where stands for the data , and represents any other information that may have been used to create the model. The distribution is thus the compounding
In the field of pharmacy, compounding (performed in compounding pharmacies) is preparation of a custom formulation of a medication to fit a unique need of a patient that cannot be met with commercially available products. This may be done for me ...
of the conditional distribution of given the data and with the marginal distribution of given the data.
With data points, if uninformative, or flat, the location prior can be taken for ''μ'', and the scale prior can be taken for ''σ''2, then Bayes' theorem gives
:
a normal distribution and a scaled inverse chi-squared distribution respectively, where and
:
The marginalization integral thus becomes
:
This can be evaluated by substituting , where , giving
:
so
:
But the ''z'' integral is now a standard Gamma integral, which evaluates to a constant, leaving
:
This is a form of the ''t''-distribution with an explicit scaling and shifting that will be explored in more detail in a further section below. It can be related to the standardized ''t''-distribution by the substitution
:
The derivation above has been presented for the case of uninformative priors for and ; but it will be apparent that any priors that lead to a normal distribution being compounded with a scaled inverse chi-squared distribution will lead to a ''t''-distribution with scaling and shifting for , although the scaling parameter corresponding to above will then be influenced both by the prior information and the data, rather than just by the data as above.
Characterization
As the distribution of a test statistic
Student's ''t''-distribution with degrees of freedom can be defined as the distribution of the random variable ''T'' with
:
where
* ''Z'' is a standard normal with expected value 0 and variance 1;
* ''V'' has a chi-squared distribution () with degrees of freedom;
* ''Z'' and ''V'' are independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
;
A different distribution is defined as that of the random variable defined, for a given constant ''μ'', by
:
This random variable has a noncentral ''t''-distribution with noncentrality parameter Noncentral distributions are families of probability distributions that are related to other "central" families of distributions by means of a noncentrality parameter. Whereas the central distribution describes how a test statistic is distributed w ...
''μ''. This distribution is important in studies of the power of Student's ''t''-test.
Derivation
Suppose ''X''1, ..., ''X''''n'' are independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
realizations of the normally-distributed, random variable ''X'', which has an expected value ''μ'' and variance ''σ''2. Let
:
be the sample mean, and
:
be an unbiased estimate of the variance from the sample. It can be shown that the random variable
:
has a chi-squared distribution with degrees of freedom (by Cochran's theorem). It is readily shown that the quantity
:
is normally distributed with mean 0 and variance 1, since the sample mean is normally distributed with mean ''μ'' and variance ''σ''2/''n''. Moreover, it is possible to show that these two random variables (the normally distributed one ''Z'' and the chi-squared-distributed one ''V'') are independent. Consequently the pivotal quantity
:
which differs from ''Z'' in that the exact standard deviation ''σ'' is replaced by the random variable ''S''''n'', has a Student's ''t''-distribution as defined above. Notice that the unknown population variance ''σ''2 does not appear in ''T'', since it was in both the numerator and the denominator, so it canceled. Gosset intuitively obtained the probability density function stated above, with equal to ''n'' − 1, and Fisher proved it in 1925.
The distribution of the test statistic ''T'' depends on , but not ''μ'' or ''σ''; the lack of dependence on ''μ'' and ''σ'' is what makes the ''t''-distribution important in both theory and practice.
As a maximum entropy distribution
Student's ''t''-distribution is the maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entrop ...
for a random variate ''X'' for which is fixed.
Properties
Moments
For , the raw moments of the ''t''-distribution are
: