probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

and statistics, the Hermite distribution, named after

Charles Hermite Charles Hermite () FRS FRSE MIAS (24 December 1822 – 14 January 1901) was a French mathematician who did research concerning number theory, quadratic forms, invariant theory, orthogonal polynomials, elliptic functions, and algebra. Herm ...

, is a

discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

used to model ''count data'' with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate

over-dispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a g ...

in the data. The authors Kemp and Kemp have called it "Hermite distribution" from the fact its

probability function Probability function may refer to: * Probability distribution * Probability axioms, which define a probability function * Probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a prob ...

and the

moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compar ...

can be expressed in terms of the coefficients of (modified)

Hermite polynomials In mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence. The polynomials arise in: * signal processing as Hermitian wavelets for wavelet transform analysis * probability, such as the Edgeworth series, as well ...

History

The distribution first appeared in the paper ''Applications of Mathematics to Medical Problems'', by

Anderson Gray McKendrick Lt Col Anderson Gray McKendrick DSc FRSE (8 September 1876 – 30 May 1943) was a Scottish military physician and epidemiologist who pioneered the use of mathematical methods in epidemiology. Irwin (see below) commented on the quality of his wor ...

in 1926. In this work the author explains several mathematical methods that can be applied to medical research. In one of this methods he considered the bivariate Poisson distribution and showed that the distribution of the sum of two correlated Poisson variables follow a distribution that later would be known as Hermite distribution. As a practical application, McKendrick considered the distribution of counts of

bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...

leucocytes White blood cells, also called leukocytes or leucocytes, are the cells of the immune system that are involved in protecting the body against both infectious disease and foreign invaders. All white blood cells are produced and derived from multi ...

. Using the method of moments he fitted the data with the Hermite distribution and found the model more satisfactory than fitting it with a

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...

. The distribution was formally introduced and published by C. D. Kemp and Adrienne W. Kemp in 1965 in their work ''Some Properties of ‘Hermite’ Distribution''. The work is focused on the properties of this distribution for instance a necessary condition on the parameters and their maximum likelihood estimators (MLE), the analysis of the

probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often ...

(PGF) and how it can be expressed in terms of the coefficients of (modified)

. An example they have used in this publication is the distribution of counts of bacteria in leucocytes that used McKendrick but Kemp and Kemp estimate the model using the

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...

method. Hermite distribution is a special case of discrete

compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. T ...

with only two parameters.Johnson, N.L., Kemp, A.W., and Kotz, S. (2005) Univariate Discrete Distributions, 3rd Edition, Wiley, . The same authors published in 1966 the paper ''An alternative Derivation of the Hermite Distribution''. In this work established that the Hermite distribution can be obtained formally by combining a

with a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

. In 1971, Y. C. Patel did a comparative study of various estimation procedures for the Hermite distribution in his doctoral thesis. It included maximum likelihood, moment estimators, mean and zero frequency estimators and the method of even points. In 1974, Gupta and Jain did a research on a generalized form of Hermite distribution.

Definition

Probability mass function

Let ''X''₁ and ''X''₂ be two independent Poisson variables with parameters ''a''₁ and ''a''₂. The

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

of the

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

''Y'' = ''X''₁ + 2''X''₂ is the Hermite distribution with parameters ''a''₁ and ''a''₂ and

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

is given by :

p_n = P(Y=n) = e^ \sum_^ \frac

where * ''n'' = 0, 1, 2, ... * ''a''₁, ''a''₂ ≥ 0. * (''n'' − 2''j'')! and ''j''! are the

factorial In mathematics, the factorial of a non-negative denoted is the product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial: \begin n! &= n \times (n-1) \times (n-2) ...

s of (''n'' − 2''j'') and ''j'', respectively. *

\lfloor n/2\rfloor

is the integer part of ''n''/2. The

of the probability mass is, :

G_Y(s) = \sum_^\infty p_n s^n = \exp(a_1(s-1)+a_2(s^2-1))

Notation

When a

''Y'' = ''X''₁ + 2''X''₂ is distributed by an Hermite distribution, where ''X''₁ and ''X''₂ are two independent Poisson variables with parameters ''a''₁ and ''a''₂, we write :

Y\ \sim \operatorname(a_1,a_2)\,

Properties

Moment and cumulant generating functions

The

of a random variable ''X'' is defined as the expected value of ''e''^''t'', as a function of the real parameter ''t''. For an Hermite distribution with parameters ''X''₁ and ''X''₂, the moment generating function exists and is equal to :

M(t) = G (e^t) = \exp(a_1(e^t-1)+a_2(e^-1))

The

cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will hav ...

is the logarithm of the moment generating function and is equal to :

K(t) = \log(M(t)) = a_1(e^t-1)+a_2(e^-1)

If we consider the coefficient of (''it'')^''r''''r''! in the expansion of ''K''(''t'') we obtain the ''r''-cumulant :

k_n = a_1 +2^n a_2

Hence the

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...

and the succeeding three moments about it are

Skewness

The

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimo ...

is the third moment centered around the mean divided by the 3/2 power of the standard deviation, and for the hermite distribution is, :

\gamma_1 = \frac = \frac

*Always

\gamma_1>0

, so the mass of the distribution is concentrated on the left.

Kurtosis

The

kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...

is the fourth moment centered around the mean, divided by the square of the

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

, and for the Hermite distribution is, :

\beta_2= \frac = \frac = \frac +3

The

excess kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtos ...

is just a correction to make the kurtosis of the normal distribution equal to zero, and it is the following, :

\gamma_2= \frac-3 = \frac

*Always

\beta_2 >3

, or

\gamma_2 >0

the distribution has a high acute peak around the mean and fatter tails.

Characteristic function

In a discrete distribution the

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at point ...

of any real-valued random variable is defined as the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

e^

, where ''i'' is the imaginary unit and ''t'' ∈ ''R'' :

\phi(t)= E^= \sum_^\infty e^P =j /math>

This function is related to the moment-generating function via \phi_x(t) = M_X(it) . Hence for this distribution the characteristic function is, : \phi_x(t) = \exp(a_1(e^-1)+a_2(e^-1))

Cumulative distribution function

The

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

is, :

\begin
F(x;a_1,a_2)& = P(X \leq x)\\
& = \exp (-(a_1+a_2)) \sum_^ \sum_^ \frac
\end

Other properties

* This distribution can have any number of modes. As an example, the fitted distribution for McKendrick’s data has an estimated parameters of

\hat_1=0.0135

\hat_2= 0.0932

. Therefore, the first five estimated probabilities are 0.899, 0.012, 0.084, 0.001, 0.004. Plot Hermite distribution

* This distribution is closed under addition or closed under convolutions. Like the

, the Hermite distribution has this property. Given two Hermite-distributed random variables

X_1 \sim \operatorname(a_1,a_2)

and

X_2 \sim \operatorname(b_1,b_2)

, then ''Y'' = ''X''₁ + ''X''₂ follows an Hermite distribution,

Y \sim \operatorname(a_1+b_1,a_2+b_2)

. * This distribution allows a moderate

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a g ...

, so it can be used when data has this property. A random variable has overdispersion, or it is overdispersed with respect the Poisson distribution, when its variance is greater than its expected value. The Hermite distribution allows a moderate overdispersion because the coefficient of dispersion is always between 1 and 2, ::

d = \frac = \frac = 1 + \frac

Parameter estimation

Method of moments

The

and the

of the Hermite distribution are

\mu = a_1+2a_2

and

\sigma^2 =a_1+4a_2

, respectively. So we have these two equation, :

\begin
        \bar = a_1 + 2a_2 \\
        \sigma^2 = a_1 + 4a_2
      \end

Solving these two equation we get the moment estimators

\hat

and

\hat

of ''a''₁ and ''a''₂. :

\hat = 2 \bar- \sigma^2

\hat = \frac

Since ''a''₁ and ''a''₂ both are positive, the estimator

\hat

and

\hat

are admissible (≥ 0) only if,

\bar < \sigma^2 < 2 \bar

Maximum likelihood

Given a sample ''X''₁, ..., ''X''_''m'' are

independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independen ...

each having an Hermite distribution we wish to estimate the value of the parameters

\hat

and

\hat

. We know that the mean and the variance of the distribution are

\mu = a_1+2a_2

and

\sigma^2 =a_1+4a_2

, respectively. Using these two equation, :

a_2 = \dfrac \end

We can parameterize the probability function by μ and ''d'' :

P(X=x)= \exp\left(-\left(\mu(2-d)+ \frac\right)\right) \sum_^ \frac

Hence the log-likelihood function is, :

\begin
\mathcal(x_1,\ldots,x_m;\mu,d)& = \log(\mathcal(x_1,\ldots,x_m;\mu,d))\\
& = m\mu \left(-1 + \frac\right) + \log(\mu(2-d)) \sum_^m x_i + \sum_^m \log(q_i(\theta))
\end

where *

q_i(\theta) = \sum_^ \frac

\theta = \frac

From the log-likelihood function, the

likelihood equations The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...

are, :

\frac = m \left(-1 + \frac\right) + \frac \sum_^m x_i - \frac \sum_^m \frac

\frac = m \frac - \frac - \frac \sum_^m \sum_^m \frac

Straightforward calculations show that, *

\mu = \bar

* And ''d'' can be found by solving, ::

\sum_^m \frac= m(\bar(2-d))^2

where

\tilde = \frac

* It can be shown that the log-likelihood function is strictly concave in the domain of the parameters. Consequently, the MLE is unique. The likelihood equation does not always have a solution like as it shows the following proposition, Proposition: Let ''X''₁, ..., ''X''_''m'' come from a generalized Hermite distribution with fixed ''n''. Then the MLEs of the parameters are

\hat

and

\tilde

if only if

m^/\bar^2 > 1

, where

m^ = \sum_^n x_i(x_i-1)/n

indicates the empirical factorial momement of order 2. * Remark 1: The condition

m^/\bar^2 > 1

is equivalent to

\tilde > 1

where

\tilde = \sigma^2 / \bar

is the empirical dispersion index * Remark 2: If the condition is not satisfied, then the MLEs of the parameters are

\hat = \bar

and

\tilde =1

, that is, the data are fitted using the Poisson distribution.

Zero frequency and the mean estimators

A usual choice for discrete distributions is the zero relative frequency of the data set which is equated to the probability of zero under the assumed distribution. Observing that

f_0 = \exp(-(a_1+a_2))

and

\mu=a_1+2a_2

. Following the example of Y. C. Patel (1976) the resulting system of equations, :

\begin
        \bar=a_1+2a_2 \\
        f_0 =\exp(-(a_1+a_2))
      \end

We obtain the

zero frequency 0 (zero) is a number representing an empty quantity. In place-value notation such as the Hindu–Arabic numeral system, 0 also serves as a placeholder numerical digit, which works by multiplying digits to the left of 0 by the radix, usual ...

and the mean estimator ''a''₁ of

\hat

and ''a''₂ of

\hat

, :

\hat=-(\bar+2\log(f_0))

\hat = \bar+\log(f_0)

where

f_0 = \frac

, is the zero relative frequency, ''n'' > 0 It can be seen that for distributions with a high probability at 0, the efficiency is high. * For admissible values of

\hat

and

\hat

, we must have ::

-\log\left(\frac\right) < \bar < -2\log\left(\frac\right)

Testing Poisson assumption

When Hermite distribution is used to model a data sample is important to check if the

is enough to fit the data. Following the parametrized

used to calculate the maximum likelihood estimator, is important to corroborate the following hypothesis, :

\begin
        H_0: d=1 \\
        H_1: d> 1
      \end

Likelihood-ratio test

The

likelihood-ratio test In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after ...

statistic for hermite distribution is, :

W = 2(\mathcal(X;\hat,\hat)-\mathcal(X;\hat,1))

Where

\mathcal()

is the log-likelihood function. As ''d'' = 1 belongs to the boundary of the domain of parameters, under the null hypothesis, ''W'' does not have an asymptotic

\chi_1^2

distribution as expected. It can be established that the asymptotic distribution of ''W'' is a 50:50 mixture of the constant 0 and the

\chi_1^2

. The α upper-tail percentage points for this mixture are the same as the 2α upper-tail percentage points for a

\chi_1^2

; for instance, for α = 0.01, 0.05, and 0.10 they are 5.41189, 2.70554 and 1.64237.

The "score" or Lagrange multiplier test

The score statistic is, :

2 = \frac

where ''m'' is the number of observations. The asymptotic distribution of the score test statistic under the null hypothesis is a

\chi_1^2

distribution. It may be convenient to use a signed version of the score test, that is,

\operatorname(m^ - \bar^2)\sqrt

, following asymptotically a standard normal.

References

{{ProbDistributions, discrete-infinite Discrete distributions

History

Definition

Probability mass function

Notation

Properties

Moment and cumulant generating functions

Skewness

Kurtosis

Characteristic function

Cumulative distribution function

Other properties

Parameter estimation

Method of moments

Maximum likelihood

Zero frequency and the mean estimators

Testing Poisson assumption

Likelihood-ratio test

The "score" or Lagrange multiplier test

See also

References