A likelihood function (often simply called the likelihood) measures how well a
statistical model explains
observed data by calculating the probability of seeing that data under different
parameter values of the model. It is constructed from the
joint probability distribution
A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...
of the
random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.
In
maximum likelihood estimation, the
argument that maximizes the likelihood function serves as a
point estimate for the unknown parameter, while the
Fisher information (often approximated by the likelihood's
Hessian matrix at the maximum) gives an indication of the estimate's
precision.
In contrast, in
Bayesian statistics, the estimate of interest is the ''converse'' of the likelihood, the so-called
posterior probability of the parameter given the observed data, which is calculated via
Bayes' rule.
Definition
The likelihood function, parameterized by a (possibly multivariate) parameter
, is usually defined differently for
discrete and continuous probability distributions (a more general definition is discussed below). Given a probability density or mass function
where
is a realization of the random variable
, the likelihood function is
often written
In other words, when
is viewed as a function of
with
fixed, it is a probability density function, and when viewed as a function of
with
fixed, it is a likelihood function. In the
frequentist paradigm, the notation
is often avoided and instead
or
are used to indicate that
is regarded as a fixed unknown quantity rather than as a
random variable being conditioned on.
The likelihood function does ''not'' specify the probability that
is the truth, given the observed sample
. Such an interpretation is a common error, with potentially disastrous consequences (see
prosecutor's fallacy).
Discrete probability distribution
Let
be a discrete
random variable with
probability mass function depending on a parameter
. Then the function
considered as a function of
, is the ''likelihood function'', given the
outcome of the random variable
. Sometimes the probability of "the value
of
for the parameter value
" is written as or . The likelihood is the probability that a particular outcome
is observed when the true value of the parameter is
, equivalent to the probability mass on
; it is ''not'' a probability density over the parameter
. The likelihood,
, should not be confused with
, which is the posterior probability of
given the data
.
Example
Consider a simple statistical model of a coin flip: a single parameter
that expresses the "fairness" of the coin. The parameter is the probability that a coin lands heads up ("H") when tossed.
can take on any value within the range 0.0 to 1.0. For a perfectly
fair coin,
.
Imagine flipping a fair coin twice, and observing two heads in two tosses ("HH"). Assuming that each successive coin flip is
i.i.d., then the probability of observing HH is
Equivalently, the likelihood of observing "HH" assuming
is
This is not the same as saying that
, a conclusion which could only be reached via
Bayes' theorem given knowledge about the marginal probabilities
and
.
Now suppose that the coin is not a fair coin, but instead that
. Then the probability of two heads on two flips is
Hence
More generally, for each value of
, we can calculate the corresponding likelihood. The result of such calculations is displayed in Figure 1. The integral of
over
, 1is 1/3; likelihoods need not integrate or sum to one over the parameter space.
Continuous probability distribution
Let
be a
random variable following an
absolutely continuous probability distribution with
density function (a function of
) which depends on a parameter
. Then the function
considered as a function of
, is the ''likelihood function'' (of
, given the
outcome ). Again,
is not a probability density or mass function over
, despite being a function of
given the observation
.
Relationship between the likelihood and probability density functions
The use of the
probability density in specifying the likelihood function above is justified as follows. Given an observation
, the likelihood for the interval