statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the ''t''-statistic is the ratio of the difference in a number’s estimated value from its assumed value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used in a ''t''-test to determine whether to support or reject the null hypothesis. It is very similar to the z-score but with the difference that ''t''-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the ''t''-statistic is used in estimating the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

from a sampling distribution of sample means if the population

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

is unknown. It is also used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.

Definition and features

Let

\hat\beta

be an estimator of parameter ''β'' in some

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...

. Then a ''t''-statistic for this parameter is any quantity of the form :

t_ = \frac,

where ''β''₀ is a non-random, known constant, which may or may not match the actual unknown parameter value ''β'', and

\operatorname(\hat\beta)

is the standard error of the estimator

\hat\beta

for ''β''. By default, statistical packages report ''t''-statistic with (these ''t''-statistics are used to test the significance of corresponding regressor). However, when ''t''-statistic is needed to test the hypothesis of the form , then a non-zero ''β''₀ may be used. If

\hat\beta

is an ordinary least squares estimator in the classical linear regression model (that is, with normally distributed and homoscedastic error terms), and if the true value of the parameter ''β'' is equal to ''β''₀, then the sampling distribution of the ''t''-statistic is the Student's ''t''-distribution with degrees of freedom, where ''n'' is the number of observations, and ''k'' is the number of regressors (including the intercept). In the majority of models, the estimator

\hat\beta

consistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

for ''β'' and is distributed asymptotically normally. If the true value of the parameter ''β'' is equal to ''β''₀, and the quantity

\operatorname(\hat\beta)

correctly estimates the asymptotic variance of this estimator, then the ''t''-statistic will asymptotically have the standard normal distribution. In some models the distribution of the ''t''-statistic is different from the normal distribution, even asymptotically. For example, when a

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...

with a

unit root In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if ...

is regressed in the augmented Dickey–Fuller test, the test ''t''-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).

Use

Most frequently, ''t'' statistics are used in Student's ''t''-tests, a form of

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

, and in the computation of certain confidence intervals. The key property of the ''t'' statistic is that it is a pivotal quantity – while defined in terms of the sample mean, its sampling distribution does not depend on the population parameters, and thus it can be used regardless of what these may be. One can also divide a residual by the sample

: :

g(x,X) = \frac

to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a z-score, the z-score requiring the population parameters.

Prediction

Given a normal distribution

N(\mu, \sigma^2)

with unknown mean and variance, the ''t''-statistic of a future observation

X_,

after one has made ''n'' observations, is an ancillary statistic – a pivotal quantity (does not depend on the values of ''μ'' and ''σ''²) that is a statistic (computed from observations). This allows one to compute a frequentist prediction interval (a predictive confidence interval), via the following t-distribution: :

\frac \sim T^.

Solving for

X_

yields the prediction distribution :

\overline_n + s_n \sqrt \cdot T^,

from which one may compute predictive confidence intervals – given a probability ''p'', one may compute intervals such that 100''p''% of the time, the next observation

X_

will fall in that interval.

History

The term "''t''-statistic" is abbreviated from "hypothesis test statistic". In statistics, the t-distribution was first derived as a posterior distribution in 1876 by Helmert and Lüroth. The t-distribution also appeared in a more general form as Pearson Type IV distribution in

Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...

's 1895 paper. However, the T-Distribution, also known as Student's T Distribution gets its name from William Sealy Gosset who was first to publish the result in English in his 1908 paper titled "The Probable Error of a Mean" (in Biometrika) using his pseudonym "Student" because his employer preferred their staff to use pen names when publishing scientific papers instead of their real name, so he used the name "Student" to hide his identity. Gosset worked at the Guinness Brewery in

Dublin Dublin is the capital and largest city of Republic of Ireland, Ireland. Situated on Dublin Bay at the mouth of the River Liffey, it is in the Provinces of Ireland, province of Leinster, and is bordered on the south by the Dublin Mountains, pa ...

Ireland Ireland (, ; ; Ulster Scots dialect, Ulster-Scots: ) is an island in the North Atlantic Ocean, in Northwestern Europe. Geopolitically, the island is divided between the Republic of Ireland (officially Names of the Irish state, named Irelan ...

, and was interested in the problems of small samples – for example, the chemical properties of barley where sample sizes might be as few as 3. Hence a second version of the etymology of the term Student is that Guinness did not want their competitors to know that they were using the t-test to determine the quality of raw material. Although it was William Gosset after whom the term "Student" is penned, it was actually through the work of

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...

that the distribution became well known as "Student's distribution" and "

Student's t-test Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's ''t''- ...

Related concepts

* ''z''-score (standardization): If the population parameters are known, then rather than computing the t-statistic, one can compute the z-score; analogously, rather than using a ''t''-test, one uses a ''z''-test. This is rare outside of

standardized testing A standardized test is a test that is administered and scored in a consistent or standard manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermine ...

. * Studentized residual: In regression analysis, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a

simple linear regression In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x ...

), and thus one must divide the different residuals by different estimates for the error, yielding what are called studentized residuals.

References

{{DEFAULTSORT:T-statistic Statistical ratios Parametric statistics Normal distribution

Definition and features

Use

Prediction

History

Related concepts

See also

References