The likelihood function (often simply called the likelihood) represents the probability of
random variable realizations conditional on particular values of the
statistical parameters. Thus, when evaluated on a
given sample, the likelihood function indicates which parameter values are more ''likely'' than others, in the sense that they would have made the observed data more probable. Consequently, the likelihood is often written as
instead of
, to emphasize that it is to be understood as a function of the parameters
instead of the random variable
.
In
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
, the
arg max of the likelihood function serves as a
point estimate for
, while local curvature (approximated by the likelihood's
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
) indicates the estimate's
precision
Precision, precise or precisely may refer to:
Science, and technology, and mathematics Mathematics and computing (general)
* Accuracy and precision, measurement deviation from true value and its scatter
* Significant figures, the number of digit ...
. Meanwhile in
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, parameter estimates are derived from the converse of the likelihood, the so-called
posterior probability, which is calculated via
Bayes' rule.
Definition
The likelihood function, parameterized by a (possibly multivariate) parameter
, is usually defined differently for
discrete and continuous probability distributions (a more general definition is discussed below). Given a probability density or mass function
:
where
is a realization of the random variable
, the likelihood function is
:
often written
:
In other words, when
is viewed as a function of
with
fixed, it is a probability density function, and when viewed as a function of
with
fixed, it is a likelihood function. The likelihood function does ''not'' specify the probability that
is the truth, given the observed sample
. Such an interpretation is a common error, with potentially disastrous consequences (see
prosecutor's fallacy
The prosecutor's fallacy is a fallacy of statistical reasoning involving a test for an occurrence, such as a DNA match. A positive result in the test may paradoxically be more likely to be an erroneous result than an actual occurrence, even i ...
).
Discrete probability distribution
Let
be a discrete
random variable with
probability mass function depending on a parameter
. Then the function
:
considered as a function of
, is the ''likelihood function'', given the
outcome of the random variable
. Sometimes the probability of "the value
of
for the parameter value
" is written as or . The likelihood is the probability that a particular outcome
is observed when the true value of the parameter is
, equivalent to the probability mass on
; it is ''not'' a probability density over the parameter
. The likelihood,
, should not be confused with
, which is the posterior probability of
given the data
.
Given no event (no data), the likelihood is 1; any non-trivial event will have a lower likelihood.
Example
Consider a simple statistical model of a coin flip: a single parameter
that expresses the "fairness" of the coin. The parameter is the probability that a coin lands heads up ("H") when tossed.
can take on any value within the range 0.0 to 1.0. For a perfectly
fair coin
In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In the ...
,
.
Imagine flipping a fair coin twice, and observing two heads in two tosses ("HH"). Assuming that each successive coin flip is
i.i.d.
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
, then the probability of observing HH is
:
Equivalently, the likelihood at
given that "HH" was observed is 0.25:
:
This is not the same as saying that
, a conclusion which could only be reached via
Bayes' theorem given knowledge about the marginal probabilities
and
.
Now suppose that the coin is not a fair coin, but instead that
. Then the probability of two heads on two flips is
:
Hence
:
More generally, for each value of
, we can calculate the corresponding likelihood. The result of such calculations is displayed in Figure 1. Note that the integral of
over
, 1is 1/3; likelihoods need not integrate or sum to one over the parameter space.
Continuous probability distribution
Let
be a
random variable following an
absolutely continuous probability distribution
In probability theory and statistics, a probability distribution is the mathematical Function (mathematics), function that gives the probabilities of occurrence of different possible outcomes for an Experiment (probability theory), experiment. ...
with
density function (a function of
) which depends on a parameter
. Then the function
:
considered as a function of
, is the ''likelihood function'' (of
, given the
outcome ). Again, note that
is not a probability density or mass function over
, despite being a function of
given the observation
.
Relationship between the likelihood and probability density functions
The use of the
probability density
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
in specifying the likelihood function above is justified as follows. Given an observation
, the likelihood for the interval