In statistics, Grubbs's test or the Grubbs test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normalized residual test or extreme studentized deviate test, is a

test Test(s), testing, or TEST may refer to: * Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities Arts and entertainment * ''Test'' (2013 film), an American film * ''Test'' (2014 film), ...

used to detect

outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...

s in a

univariate In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariat ...

data set assumed to come from a

normally distributed In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...

population.

Definition

Grubbs's test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test. Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers. Grubbs's test is defined for the

hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can testable, test it. Scientists generally base scientific hypotheses on prev ...

: :H₀: There are no outliers in the data set :H_a: There is exactly one outlier in the data set The Grubbs test statistic is defined as: :

G =  \frac

with

\overline

and

s

denoting the

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger po ...

and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation. This is the

two-sided test In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate if ...

, for which the hypothesis of no outliers is rejected at

significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

α if :

G > \frac \sqrt

with ''t''_{α/(2''N''),''N''−2} denoting the upper

critical value Critical value may refer to: *In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''. *In statistical hypothesis ...

of the t-distribution with ''N'' − 2

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

and a significance level of α/(2''N'').

One-sided case

The Grubbs test can also be defined as a one-sided test, replacing α/(2''N'') with α/''N''. To test whether the minimum value is an outlier, the test statistic is :

G = \frac

with ''Y''_min denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is :

G = \frac

with ''Y''_max denoting the maximum value.

Related techniques

Several

graphical technique Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tab ...

s can be used to detect outliers. A simple

run sequence plot A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence. Often, the data displayed represent some aspect of the output or performance of a manufacturing or other business process. It is therefore ...

, a

box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...

, or a

histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...

should show any obviously outlying points. A

normal probability plot The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw ...

may also be useful.

Definition

One-sided case

Related techniques

See also

References

Further reading