In statistics, Grubbs's test or the Grubbs test (named after
Frank E. Grubbs, who published the test in 1950), also known as the maximum normalized
residual test or extreme studentized deviate test, is a
test
Test(s), testing, or TEST may refer to:
* Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities
Arts and entertainment
* ''Test'' (2013 film), an American film
* ''Test'' (2014 film), ...
used to detect
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s in a
univariate
In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariat ...
data set assumed to come from a
normally distributed
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu is ...
population.
Definition
Grubbs's test is based on the assumption of
normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.
Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.
Grubbs's test is defined for the
hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can testable, test it. Scientists generally base scientific hypotheses on prev ...
:
:H
0: There are no outliers in the data set
:H
a: There is exactly one outlier in the data set
The Grubbs test statistic is defined as:
:
with
and
denoting the
sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables.
The sample mean is the average value (or mean value) of a sample of numbers taken from a larger po ...
and
standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.
This is the
two-sided test
In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate if ...
, for which the hypothesis of no outliers is rejected at
significance level
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
α if
:
with ''t''
α/(2''N''),''N''−2 denoting the upper
critical value
Critical value may refer to:
*In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''.
*In statistical hypothesis ...
of the
t-distribution with ''N'' − 2
degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
and a significance level of α/(2''N'').
One-sided case
The Grubbs test can also be defined as a one-sided test, replacing α/(2''N'') with α/''N''. To test whether the minimum value is an outlier, the test statistic is
:
with ''Y''
min denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is
:
with ''Y''
max denoting the maximum value.
Related techniques
Several
graphical technique
Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.
Overview
Whereas statistics and data analysis procedures generally yield their output in numeric or tab ...
s can be used to detect outliers. A simple
run sequence plot
A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence. Often, the data displayed represent some aspect of the output or performance of a manufacturing or other business process. It is therefore ...
, a
box plot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
, or a
histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...
should show any obviously outlying points. A
normal probability plot
The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw ...
may also be useful.
See also
*
Chauvenet's criterion
In statistical theory, Chauvenet's criterion (named for William Chauvenet) is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is likely to be spurious.
Derivation
The idea behind ...
*
Peirce's criterion In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce.
Outliers removed by Peirce's criterion
The problem of outliers
In data sets containing real-numbered measurements, ...
*
Q test
In statistics, Dixon's ''Q'' test, or simply the ''Q'' test, is used for identification and rejection of outliers. This assumes normal distribution and per Robert Dean and Wilfrid Dixon, and others, this test should be used sparingly and never mo ...
*
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points.
This ...
*
Tau distribution
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points.
This i ...
References
Further reading
*
*
{{NIST-PD
Statistical tests
Statistical outliers