An estimation procedure that is often claimed to be part of
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
is the maximum a posteriori (MAP) estimate of an unknown quantity, that equals the
mode of the
posterior density
Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...
with respect to some reference measure, typically the
Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...
. The MAP can be used to obtain a
point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
(ML) estimation, but employs an augmented
optimization objective which incorporates a
prior
The term prior may refer to:
* Prior (ecclesiastical), the head of a priory (monastery)
* Prior convictions, the life history and previous convictions of a suspect or defendant in a criminal case
* Prior probability, in Bayesian statistics
* Prio ...
density
Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...
over the quantity one wants to estimate. MAP estimation is therefore a
regularization of maximum likelihood estimation, so is not a well-defined statistic of the Bayesian posterior distribution.
Description
Assume that we want to estimate an unobserved population parameter
on the basis of observations
. Let
be the
sampling distribution of
, so that
is the probability of
when the underlying population parameter is
. Then the function:
:
is known as the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
and the estimate:
:
is the maximum likelihood estimate of
.
Now assume that a
prior distribution
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
over
exists. This allows us to treat
as a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
as in
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
. We can calculate the
posterior density of
using
Bayes' theorem
Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
:
:
where
is density function of
,
is the domain of
.
The method of maximum a posteriori estimation then estimates
as the
mode of the posterior density of this random variable:
:
The denominator of the posterior density (the
marginal likelihood of the model) is always positive and does not depend on
and therefore plays no role in the optimization. Observe that the MAP estimate of
coincides with the ML estimate when the prior
is uniform (i.e.,
is a
constant function
In mathematics, a constant function is a function whose (output) value is the same for every input value.
Basic properties
As a real-valued function of a real-valued argument, a constant function has the general form or just For example, ...
), which occurs whenever the prior distribution is taken as the reference measure, as is typical in function-space applications.
When the
loss function is of the form
: