statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, normality tests are used to determine if a

data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...

is well-modeled by a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

and to compute how likely it is for a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

underlying the data set to be normally distributed. More precisely, the tests are a form of

model selection Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of ...

, and can be interpreted several ways, depending on one's

interpretations of probability The word "probability" has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical, tendency of something to occur, or is it a measure of how strongly on ...

: * In

descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...

terms, one measures a

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measur ...

of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. * In

frequentist statistics Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

, data are tested against the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

that it is normally distributed. * In

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

, one does not "test normality" per se, but rather computes the likelihood that the data come from a normal distribution with given parameters ''μ'',''σ'' (for all ''μ'',''σ''), and compares that with the likelihood that the data come from other distributions under consideration, most simply using a

Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such as a null hypothesis ...

(giving the relative likelihood of seeing the data given different models), or more finely taking a

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

on possible models and parameters and computing a

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

given the computed likelihoods. A normality test is used to determine whether sample data has been drawn from a normally distributed population (within some tolerance). A number of statistical tests, such as the Student's t-test and the one-way and two-way ANOVA, require a normally distributed sample population.

Graphical methods

An informal approach to testing normality is to compare a

histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...

of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small. In this case one might proceed by regressing the data against the

quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. There is one fewer quantile t ...

s of a normal distribution with the same mean and variance as the sample. Lack of fit to the regression line suggests a departure from normality (see Anderson Darling coefficient and minitab). A graphical tool for assessing normality is the

normal probability plot The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw ...

, a quantile-quantile plot (QQ plot) of the standardized data against the

standard normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac e^ ...

. Here the

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

between the sample data and normal quantiles (a measure of the goodness of fit) measures how well the data are modeled by a normal distribution. For normal data the points plotted in the QQ plot should fall approximately on a straight line, indicating high positive correlation. These plots are easy to interpret and also have the benefit that outliers are easily identified.

Back-of-the-envelope test

Simple

back-of-the-envelope A back-of-the-envelope calculation is a rough calculation, typically jotted down on any available scrap of paper such as an envelope. It is more than a guess but less than an accurate calculation or mathematical proof. The defining characteristic o ...

test takes the

sample maximum and minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample (statistics), sample. They are basic summary statistics, used in de ...

and computes their

z-score In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...

, or more properly

t-statistic In statistics, the ''t''-statistic is the ratio of the difference in a number’s estimated value from its assumed value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used in a ''t''-t ...

(number of sample standard deviations that a sample is above or below the sample mean), and compares it to the

68–95–99.7 rule In statistics, the 68–95–99.7 rule, also known as the empirical rule, and sometimes abbreviated 3sr or 3, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: approximately ...

: if one has a 3''σ'' event (properly, a 3''s'' event) and substantially fewer than 300 samples, or a 4''s'' event and substantially fewer than 15,000 samples, then a normal distribution will understate the maximum magnitude of deviations in the sample data. This test is useful in cases where one faces

kurtosis risk In statistics and decision theory, kurtosis risk is the risk that results when a statistical model assumes the normal distribution, but is applied to observations that have a tendency to occasionally be much farther (in terms of number of standar ...

– where large deviations matter – and has the benefits that it is very easy to compute and to communicate: non-statisticians can easily grasp that "6''σ'' events are very rare in normal distributions".

Frequentist tests

Tests of univariate normality include the following: *

D'Agostino's K-squared test In statistics, D'Agostino's ''K''2 test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to gauge the compatibility of given data with the null hypothesis that the data is a realizatio ...

, *

Jarque–Bera test In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera. The test statistic is always nonnegativ ...

, *

Anderson–Darling test The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, i ...

, *

Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. ...

, *

Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...

: this test only works if the mean and the variance of the normal distribution are assumed known under the null hypothesis, *

Lilliefors test Lilliefors test is a normality test based on the Kolmogorov–Smirnov test. It is used to test the null hypothesis that data come from a normally distributed population, when the null hypothesis does not specify ''which'' normal distribution; i.e. ...

: based on the Kolmogorov–Smirnov test, adjusted for when also estimating the mean and variance from the data, *

Shapiro–Wilk test The Shapiro–Wilk test is a Normality test, test of normality. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk. Theory The Shapiro–Wilk test tests the null hypothesis that a statistical sample, sample ''x''1, ..., ''x'n'' ...

, and *

Pearson's chi-squared test Pearson's chi-squared test or Pearson's \chi^2 test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squa ...

. A 2011 study concludes that Shapiro–Wilk has the best

power Power may refer to: Common meanings * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power, a type of energy * Power (social and political), the ability to influence people or events Math ...

for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests. Some published works recommend the Jarque–Bera test, but the test has weakness. In particular, the test has low power for distributions with short tails, especially for bimodal distributions. Some authors have declined to include its results in their studies because of its poor overall performance. Historically, the third and fourth

standardized moment In probability theory and statistics, a standardized moment of a probability distribution is a moment (often a higher degree central moment) that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant ...

s (

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...

and

kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...

) were some of the earliest tests for normality. The Lin–Mudholkar test specifically targets asymmetric alternatives. The

is itself derived from

and

estimates. Mardia's multivariate skewness and kurtosis tests generalize the moment tests to the multivariate case. Other early

test statistic Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...

s include the ratio of the

mean absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, m ...

to the standard deviation and of the range to the standard deviation. More recent tests of normality include the energy test (Székely and Rizzo) and the tests based on the empirical characteristic function (ECF) (e.g. Epps and Pulley, Henze–Zirkler, BHEP test). The energy and the ECF tests are powerful tests that apply for testing univariate or multivariate normality and are statistically consistent against general alternatives. The normal distribution has the highest entropy of any distribution for a given standard deviation. There are a number of normality tests based on this property, the first attributable to Vasicek.

Bayesian tests

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...

s between the whole posterior distributions of the slope and variance do not indicate non-normality. However, the ratio of expectations of these posteriors and the expectation of the ratios give similar results to the Shapiro–Wilk statistic except for very small samples, when non-informative priors are used. Spiegelhalter suggests using a

to compare normality with a different class of distributional alternatives. This approach has been extended by Farrell and Rogers-Stewart.

Applications

One application of normality tests is to the residuals from a

linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...

model. If they are not normally distributed, the residuals should not be used in Z tests or in any other tests derived from the normal distribution, such as t tests,

F test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a statistic, represented by t ...

s and

chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...

s. If the residuals are not normally distributed, then the dependent variable or at least one

explanatory variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

may have the wrong functional form, or important variables may be missing, etc. Correcting one or more of these

systematic error Observational error (or measurement error) is the difference between a measurement, measured value of a physical quantity, quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are ...

s may produce residuals that are normally distributed; in other words, non-normality of residuals is often a model deficiency rather than a data problem.

Graphical methods

Back-of-the-envelope test

Frequentist tests

Bayesian tests

Applications

See also

Notes

Further reading