In
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the
mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
of the
posterior distribution. The MAP can be used to obtain a
point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
(ML) estimation, but employs an augmented
optimization objective which incorporates a
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
(that quantifies the additional information available through prior knowledge of a related event) over the quantity one wants to estimate. MAP estimation can therefore be seen as a
regularization
Regularization may refer to:
* Regularization (linguistics)
* Regularization (mathematics)
* Regularization (physics)
* Regularization (solid modeling)
* Regularization Law, an Israeli law intended to retroactively legalize settlements
See also ...
of maximum likelihood estimation.
Description
Assume that we want to estimate an unobserved population parameter
on the basis of observations
. Let
be the
sampling distribution of
, so that
is the probability of
when the underlying population parameter is
. Then the function:
:
is known as the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
and the estimate:
:
is the maximum likelihood estimate of
.
Now assume that a
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
over
exists. This allows us to treat
as a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
as in
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
. We can calculate the
posterior distribution of
using
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
:
:
where
is density function of
,
is the domain of
.
The method of maximum a posteriori estimation then estimates
as the
mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
of the posterior distribution of this random variable:
:
The denominator of the posterior distribution (so-called
marginal likelihood) is always positive and does not depend on
and therefore plays no role in the optimization. Observe that the MAP estimate of
coincides with the ML estimate when the prior
is uniform (i.e.,
is a
constant function
In mathematics, a constant function is a function whose (output) value is the same for every input value. For example, the function is a constant function because the value of is 4 regardless of the input value (see image).
Basic propertie ...
).
When the
loss function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
is of the form
: