In statistics, the multinomial test is the test of the

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

that the parameters of a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...

equal specified values; it is used for

categorical data In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group o ...

. Beginning with a sample of

~ N ~

items each of which has been observed to fall into one of

k

categories. It is possible to define

~ \mathbf = (x_1, x_2, \dots, x_k) ~

as the observed numbers of items in each cell. Hence

~ \sum_^k x_ = N ~.

Next, defining a vector of parameters

~ H_0: \boldsymbol = (\pi_, \pi_, \ldots, \pi_) ~,

where:

~ \sum_^k \pi_ = 1 ~.

These are the parameter values under the

. The exact probability of the observed configuration

~ \mathbf ~

under the null hypothesis is given by :

~ \operatorname\mathbb\left( \mathbf\right )_0 = N! \, \prod_^k \frac ~.

The significance probability for the test is the probability of occurrence of the data set observed, or of a data set less likely than that observed, if the null hypothesis is true. Using an

exact test In statistics, an exact (significance) test is a test such that if the null hypothesis is true, then all assumptions made during the derivation of the distribution of the test statistic are met. Using an exact test provides a significance test t ...

, this is calculated as :

~ p_\mathcal = \sum_ \operatorname\mathbb\left( \mathbf\right) ~

where the sum ranges over all outcomes as likely as, or less likely than, that observed. In practice this becomes computationally onerous as

~ k ~

and

~ N ~

increase so it is probably only worth using exact tests for small samples. For larger samples, asymptotic approximations are accurate enough and easier to calculate. One of these approximations is the likelihood ratio. An

alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed proposition in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...

can be defined under which each value

~ \pi_ ~

is replaced by its maximum likelihood estimate

~ p_=\frac ~.

The exact probability of the observed configuration

~ \mathbf ~

under the alternative hypothesis is given by :

~ \operatorname\mathbb\left(\mathbf\right)_A = N! \; \prod_^k \frac ~.

The natural logarithm of the likelihood ratio,

~ mathcal,

between these two probabilities, multiplied by

~ -2 ~,

is then the statistic for the

likelihood ratio test In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after i ...

= -2 \; \sum_^k x_\ln\left( \frac\right) ~.

(The factor

~ -2 ~

is chosen to make the statistic asymptotically chi-squared distributed, for convenient comparison to a familiar statistic commonly used for the same application.) If the null hypothesis is true, then as

~ N ~

increases, the distribution of

~ -2\ln(mathcal ~

converges to that of chi-squared with

~ k - 1 ~

degrees of freedom. However it has long been known (e.g. Lawley) that for finite sample sizes, the moments of

~ -2\ln(mathcal ~

are greater than those of chi-squared, thus inflating the probability of type I errors (false positives). The difference between the moments of chi-squared and those of the test statistic are a function of

~ N^ ~.

Williams showed that the first moment can be matched as far as

~ N^ ~

if the test statistic is divided by a factor given by :

~ q_1 = 1 + \frac ~.

In the special case where the null hypothesis is that all the values

\pi_

are equal to

~ 1 / k ~

(i.e. it stipulates a uniform distribution), this simplifies to :

~ q_1 = 1 + \frac ~.

Subsequently, Smith ''et al''. derived a dividing factor which matches the first moment as far as

~ N^ ~.

For the case of equal values of

~ \pi_ ~,

this factor is :

~ q_2 = 1 + \frac+\frac ~.

The null hypothesis can also be tested by using

Pearson's chi-squared test Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g ...

~ \chi^2 = \sum_^ \frac ~

where

~ E_i = N \pi_i ~

is the expected number of cases in category

~ i ~

under the null hypothesis. This statistic also converges to a chi-squared distribution with

~ k - 1 ~

degrees of freedom when the null hypothesis is true but does so from below, as it were, rather than from above as

~ -2\ln(mathcal ~

does, so may be preferable to the uncorrected version of

~ -2\ln(mathcal ~

for small samples.

References

{{reflist, 25em, refs= {{cite journal , author=Lawley, D.N. , year=1956 , title=A general method of approximating to the distribution of likelihood ratio criteria , journal=Biometrika , volume=43 , pages=295–303 , doi=10.1093/biomet/43.3-4.295 {{cite book , last1=Read , first1=T.R.C. , last2=Cressie , first2=N.A.C. , year=1988 , title=Goodness-of-Fit Statistics for Discrete Multivariate Data , place=New York, NY , publisher=Springer-Verlag , ISBN=0-387-96682-X {{cite journal , author1=Smith, P.J. , author2=Rae, D.S. , author3=Manderscheid, R.W. , author4=Manderscheid, S. , year=1981 , title=Approximating the moments and distribution of the likelihood ratio statistic for multinomial goodness of fit , journal=Journal of the American Statistical Association , volume=76 , issue=375 , pages=737–740 , doi=10.2307/2287541 , publisher=American Statistical Association , jstor=2287541 {{cite journal , author=Williams, D.A. , year=1976 , title=Improved Likelihood Ratio Tests for Complete Contingency Tables , journal=Biometrika , volume=63 , pages=33–37 , doi=10.1093/biomet/63.1.33 Categorical variable interactions Statistical tests Nonparametric statistics