Natural Exponential Family
   HOME

TheInfoList



OR:

In
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a natural exponential family (NEF) is a class of
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
s that is a special case of an exponential family (EF).


Definition


Univariate case

The natural exponential families (NEF) are a subset of the exponential families. A NEF is an exponential family in which the natural parameter ''η'' and the natural statistic ''T''(''x'') are both the identity. A distribution in an exponential family with parameter ''θ'' can be written with
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(PDF) f_X(x\mid \theta) = h(x)\ \exp\Big(\ \eta(\theta) T(x) - A(\theta)\ \Big) \,\! , where h(x) and A(\theta) are known functions. A distribution in a natural exponential family with parameter θ can thus be written with PDF f_X(x\mid \theta) = h(x)\ \exp\Big(\ \theta x - A(\theta)\ \Big) \,\! . [Note that slightly different notation is used by the originator of the NEF, Carl Morris.Morris C. (2006) "Natural exponential families", ''Encyclopedia of Statistical Sciences''. Morris uses ''ω'' instead of ''η'' and ''ψ'' instead of ''A''.]


General multivariate case

Suppose that \mathbf \in \mathcal \subseteq \mathbb^p, then a natural exponential family of order ''p'' has density or mass function of the form: f_X(\mathbf \mid \boldsymbol\theta) = h(\mathbf)\ \exp\Big(\boldsymbol\theta^ \mathbf - A(\boldsymbol\theta)\ \Big) \,\! , where in this case the parameter \boldsymbol\theta \in \mathbb^p .


Moment and cumulant generating functions

A member of a natural exponential family has moment generating function (MGF) of the form M_X(\mathbf) = \exp\Big(\ A(\boldsymbol\theta + \mathbf) - A(\boldsymbol\theta)\ \Big) \, . The
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
is by definition the logarithm of the MGF, so it is K_X(\mathbf) = A(\boldsymbol\theta + \mathbf) - A(\boldsymbol\theta) \, .


Kullback-Leibler divergence

The
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
of two natural exponential families with parameters \theta and \lambda is : D_(\theta , , \lambda) = (\theta - \lambda) A'(\theta) - (A(\theta) - A(\lambda)) = \int_\theta^\lambda (\lambda - t) A''(t) dt


Examples

The five most important univariate cases are: *
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
with known variance *
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
* gamma distribution with known shape parameter ''α'' (or ''k'' depending on notation set used) *
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
with known number of trials, ''n'' *
negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
with known r These five examples – Poisson, binomial, negative binomial, normal, and gamma – are a special subset of NEF, called NEF with quadratic variance function (NEF-QVF) because the variance can be written as a quadratic function of the mean. NEF-QVF are discussed below. Distributions such as the exponential, Bernoulli, and
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
s are special cases of the above five distributions. For example, the
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
is a
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
with ''n'' = 1 trial, the
exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
is a gamma distribution with shape parameter α = 1 (or ''k'' = 1 ), and the
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
is a special case of the
negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
. Some exponential family distributions are not NEF. The
lognormal In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normal distribution, normally distributed. Thus, if the random variable is log-normally distributed ...
and
Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval
, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
are in the exponential family, but not the natural exponential family. The gamma distribution with two parameters is an exponential family but not a NEF and the
chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
is a special case of the gamma distribution with fixed scale parameter, and thus is also an exponential family but not a NEF (note that only a gamma distribution with fixed shape parameter is a NEF). The
inverse Gaussian distribution In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter family of continuous probability distributions with support (mathematics), support on (0,∞). Its probability density function is ...
is a NEF with a cubic variance function. The parameterization of most of the above distributions has been written differently from the parameterization commonly used in textbooks and the above linked pages. For example, the above parameterization differs from the parameterization in the linked article in the Poisson case. The two parameterizations are related by \theta = \log(\lambda) , where λ is the mean parameter, and so that the density may be written as f(k;\theta) = \frac \exp\Big(\ \theta\ k - \exp(\theta)\ \Big) \ , for \theta \in \mathbb, so h(k) = \frac, \text A(\theta) = \exp(\theta)\ . This alternative parameterization can greatly simplify calculations in
mathematical statistics Mathematical statistics is the application of probability theory and other mathematical concepts to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques that are commonly used in statistics inc ...
. For example, in
Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
, a posterior probability distribution is calculated as the product of two distributions. Normally this calculation requires writing out the probability distribution functions (PDF) and integrating; with the above parameterization, however, that calculation can be avoided. Instead, relationships between distributions can be abstracted due to the properties of the NEF described below. An example of the multivariate case is the multinomial distribution with known number of trials.


Properties

The properties of the natural exponential family can be used to simplify calculations involving these distributions.


Univariate case


Multivariate case

In the multivariate case, the mean vector and covariance matrix are \operatorname = \nabla A(\boldsymbol\theta) \text \operatorname = \nabla \nabla^ A(\boldsymbol\theta)\, , where\nabla is the
gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
and \nabla \nabla^ is the
Hessian matrix In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
.


Natural exponential families with quadratic variance functions (NEF-QVF)

A special case of the natural exponential families are those with quadratic variance functions. Six NEFs have quadratic variance functions (QVF) in which the variance of the distribution can be written as a quadratic function of the mean. These are called NEF-QVF. The properties of these distributions were first described by Carl Morris. \operatorname(X) = V(\mu) = \nu_0 + \nu_1 \mu + \nu_2 \mu^2.


The six NEF-QVFs

The six NEF-QVF are written here in increasing complexity of the relationship between variance and mean. # The normal distribution with fixed variance X \sim N(\mu, \sigma^2) is NEF-QVF because the variance is constant. The variance can be written \operatorname(X) = V(\mu) = \sigma^2, so variance is a degree 0 function of the mean. # The Poisson distribution X \sim \operatorname(\mu) is NEF-QVF because all Poisson distributions have variance equal to the mean \operatorname(X) = V(\mu) = \mu, so variance is a linear function of the mean. # The Gamma distribution X \sim \operatorname(r, \lambda) is NEF-QVF because the mean of the Gamma distribution is \mu = r\lambda and the variance of the Gamma distribution is \operatorname(X) = V(\mu) = \mu^2/r, so the variance is a quadratic function of the mean. # The binomial distribution X \sim \operatorname(n, p) is NEF-QVF because the mean is \mu = np and the variance is \operatorname(X) = np(1-p) which can be written in terms of the mean as #:V(X) = - np^2 + np = -\mu^2/n + \mu. # The negative binomial distribution X \sim \operatorname(n, p) is NEF-QVF because the mean is \mu = np/(1-p) and the variance is V(\mu) = \mu^2/n + \mu. # The (not very famous) distribution generated by the generalized hyperbolic secant distribution (NEF-GHS) has V(\mu) = \mu^2/n +n and \mu > 0.


Properties of NEF-QVF

The properties of NEF-QVF can simplify calculations that use these distributions.


See also

*
Generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
* Pearson distribution * Sheffer sequence *
Orthogonal polynomials In mathematics, an orthogonal polynomial sequence is a family of polynomials such that any two different polynomials in the sequence are orthogonal In mathematics, orthogonality (mathematics), orthogonality is the generalization of the geom ...


References

* Morris C. (1982) ''Natural exponential families with quadratic variance functions: statistical theory''. Dept of mathematics, Institute of Statistics, University of Texas, Austin. {{ProbDistributions, families Exponentials Types of probability distributions