The goodness of fit of a

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...

describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test). In the

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

, one of the components into which the variance is partitioned may be a

lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the null ...

Fit of distributions

In assessing whether a given distribution is suited to a data-set, the following

test Test(s), testing, or TEST may refer to: * Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities Arts and entertainment * ''Test'' (2013 film), an American film * ''Test'' (2014 film) ...

s and their underlying measures of fit can be used: *

Bayesian information criterion In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on ...

Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...

Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. ...

Anderson–Darling test The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, i ...

* Berk-Jones tests *

Shapiro–Wilk test The Shapiro–Wilk test is a Normality test, test of normality. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk. Theory The Shapiro–Wilk test tests the null hypothesis that a statistical sample, sample ''x''1, ..., ''x'n'' ...

Chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...

Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to ...

Hosmer–Lemeshow test The Hosmer–Lemeshow test is a statistical test for goodness of fit and calibration (statistics), calibration for logistic regression models. It is used frequently in Predictive analytics, risk prediction models. The test assesses whether or not ...

Kuiper's test Kuiper's test is used in statistics to test whether a data sample comes from a given distribution (one-sample Kuiper test), or whether two data samples came from the same unknown distribution (two-sample Kuiper test). It is named after Dutch math ...

*Kernelized Stein discrepancy *Zhang's Z_K, Z_C and Z_A tests * Moran test *Density Based Empirical Likelihood Ratio tests

Regression analysis

In regression analysis, more specifically

regression validation In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation ...

, the following topics relate to goodness of fit: *

Coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used in t ...

(the R-squared measure of goodness of fit); *

Lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the null ...

; * Mallows's Cp criterion *

Prediction error In statistics the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a smoothing, curve fitting, or regression procedure is the expected value of the squared prediction errors (PE), the square differenc ...

Reduced chi-square In statistics, the reduced chi-square statistic is used extensively in goodness of fit testing. It is also known as mean squared weighted deviation (MSWD) in isotopic dating and variance of unit weight in the context of weighted least squares. I ...

Categorical data

The following are examples that arise in the context of

categorical data In statistics, a categorical variable (also called qualitative variable) is a variable (research), variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a ...

Pearson's chi-square test

Pearson's chi-square test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

\chi^2 = \sum_^n

where: *''O_i'' = an observed count for bin ''i'' *''E_i'' = an expected count for bin ''i'', asserted by the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

. The expected frequency is calculated by:

E_i \, = \, \bigg( F(Y_u) \, - \, F(Y_l) \bigg) \, N

where: *''F'' = the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

for the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

being tested. *''Y_u'' = the upper limit for bin ''i'', *''Y_l'' = the lower limit for bin ''i'', and *''N'' = the sample size The resulting value can be compared with a

chi-square distribution The term chi-square, chi-squared, or \chi^2 has various uses in statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analys ...

to determine the goodness of fit. The chi-square distribution has (''k'' − ''c'')

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

, where ''k'' is the number of non-empty bins and ''c'' is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter

Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum on ...

, ''c'' = 4.

Binomial case

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are ''n'' trials each with probability of success, denoted by ''p''. Provided that ''np''_''i'' ≫ 1 for every ''i'' (where ''i'' = 1, 2, ..., ''k''), then

\chi^2 = \sum_^  = \sum_^ .

This has approximately a chi-square distribution with ''k'' − 1 degrees of freedom. The fact that there are ''k'' − 1 degrees of freedom is a consequence of the restriction

\sum N_i=n

. We know there are ''k'' observed bin counts, however, once any ''k'' − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only ''k'' − 1 freely determined binn counts, thus ''k'' − 1 degrees of freedom.

''G''-test

''G''-tests are likelihood-ratio tests of

statistical significance In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...

that are increasingly being used in situations where Pearson's chi-square tests were previously recommended. The general formula for ''G'' is :

G = 2\sum_ ,

where

O_i

and

E_i

are the same as for the chi-square test,

\ln

denotes the

natural logarithm The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...

, and the sum is taken over all non-empty bins. Furthermore, the total observed count should be equal to the total expected count:

\sum_i O_i = \sum_i E_i = N

where

N

is the total number of observations. ''G''-tests have been recommended at least since the 1981 edition of the popular statistics textbook by Robert R. Sokal and F. James Rohlf.

Fit of distributions

Regression analysis

Categorical data

Pearson's chi-square test

Binomial case

''G''-test

See also

References

Further reading