The likelihood function (often simply called the likelihood) represents the probability of
random variable realizations conditional on particular values of the
statistical parameters. Thus, when evaluated on a
given sample, the likelihood function indicates which parameter values are more ''likely'' than others, in the sense that they would have made the observed data more probable. Consequently, the likelihood is often written as
instead of
, to emphasize that it is to be understood as a function of the parameters
instead of the random variable
.
In
maximum likelihood estimation, the
arg max of the likelihood function serves as a
point estimate
In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown popu ...
for
, while local curvature (approximated by the likelihood's
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
) indicates the estimate's
precision. Meanwhile in
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, parameter estimates are derived from the converse of the likelihood, the so-called
posterior probability, which is calculated via
Bayes' rule
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exampl ...
.
Definition
The likelihood function, parameterized by a (possibly multivariate) parameter
, is usually defined differently for
discrete and continuous probability distributions (a more general definition is discussed below). Given a probability density or mass function
:
where
is a realization of the random variable
, the likelihood function is
:
often written
:
In other words, when
is viewed as a function of
with
fixed, it is a probability density function, and when viewed as a function of
with
fixed, it is a likelihood function. The likelihood function does ''not'' specify the probability that
is the truth, given the observed sample
. Such an interpretation is a common error, with potentially disastrous consequences (see
prosecutor's fallacy).
Discrete probability distribution
Let
be a discrete
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
with
probability mass function depending on a parameter
. Then the function
:
considered as a function of
, is the ''likelihood function'', given the
outcome
Outcome may refer to:
* Outcome (probability), the result of an experiment in probability theory
* Outcome (game theory), the result of players' decisions in game theory
* ''The Outcome'', a 2005 Spanish film
* An outcome measure (or endpoint) ...
of the random variable
. Sometimes the probability of "the value
of
for the parameter value
" is written as or . The likelihood is the probability that a particular outcome
is observed when the true value of the parameter is
, equivalent to the probability mass on
; it is ''not'' a probability density over the parameter
. The likelihood,
, should not be confused with
, which is the posterior probability of
given the data
.
Given no event (no data), the likelihood is 1; any non-trivial event will have a lower likelihood.
Example
Consider a simple statistical model of a coin flip: a single parameter
that expresses the "fairness" of the coin. The parameter is the probability that a coin lands heads up ("H") when tossed.
can take on any value within the range 0.0 to 1.0. For a perfectly
fair coin,
.
Imagine flipping a fair coin twice, and observing two heads in two tosses ("HH"). Assuming that each successive coin flip is
i.i.d., then the probability of observing HH is
:
Equivalently, the likelihood at
given that "HH" was observed is 0.25:
:
This is not the same as saying that
, a conclusion which could only be reached via
Bayes' theorem given knowledge about the marginal probabilities
and
.
Now suppose that the coin is not a fair coin, but instead that
. Then the probability of two heads on two flips is
:
Hence
:
More generally, for each value of
, we can calculate the corresponding likelihood. The result of such calculations is displayed in Figure 1. Note that the integral of
over
, 1is 1/3; likelihoods need not integrate or sum to one over the parameter space.
Continuous probability distribution
Let
be a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
following an
absolutely continuous probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomen ...
with
density function (a function of
) which depends on a parameter
. Then the function
:
considered as a function of
, is the ''likelihood function'' (of
, given the
outcome
Outcome may refer to:
* Outcome (probability), the result of an experiment in probability theory
* Outcome (game theory), the result of players' decisions in game theory
* ''The Outcome'', a 2005 Spanish film
* An outcome measure (or endpoint) ...
). Again, note that
is not a probability density or mass function over
, despite being a function of
given the observation
.
Relationship between the likelihood and probability density functions
The use of the
probability density in specifying the likelihood function above is justified as follows. Given an observation
, the likelihood for the interval