A marginal likelihood is a

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

that has been integrated over the parameter space. In

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

, it represents the probability of generating the observed sample for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as model evidence or simply evidence. Due to the integration over the parameter space, the marginal likelihood does not directly depend upon the parameters. If the focus is not on model comparison, the marginal likelihood is simply the normalizing constant that ensures that the posterior is a proper probability. It is related to the partition function in statistical mechanics.

Concept

Given a set of independent identically distributed data points

\mathbf=(x_1,\ldots,x_n),

where

x_i \sim p(x, \theta)

according to some

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

parameterized by

\theta

, where

\theta

itself is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

described by a distribution, i.e.

\theta \sim p(\theta\mid\alpha),

the marginal likelihood in general asks what the probability

p(\mathbf\mid\alpha)

is, where

\theta

has been marginalized out (integrated out): :

p(\mathbf\mid\alpha) = \int_\theta p(\mathbf\mid\theta) \, p(\theta\mid\alpha)\ \operatorname\!\theta

The above definition is phrased in the context of

in which case

p(\theta\mid\alpha)

is called prior density and

p(\mathbf\mid\theta)

is the likelihood. Recognizing that the marginal likelihood is the normalizing constant of the Bayesian posterior density

p(\theta\mid\mathbf,\alpha)

, one also has the alternative expression :

p(\mathbf \mid \alpha) = \frac

which is an identity in

\theta

. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise in de Carvalho et al. (2019). In classical ( frequentist) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter

\theta = (\psi,\lambda)

, where

\psi

is the actual parameter of interest, and

\lambda

is a non-interesting nuisance parameter. If there exists a probability distribution for

\lambda

, it is often desirable to consider the likelihood function only in terms of

\psi

, by marginalizing out

\lambda

: :

\mathcal(\psi;\mathbf) = p(\mathbf\mid\psi) = \int_\lambda p(\mathbf\mid\lambda,\psi) \, p(\lambda\mid\psi) \ \operatorname\!\lambda

Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the

conjugate prior In Bayesian probability theory, if, given a likelihood function p(x \mid \theta), the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posteri ...

of the distribution of the data. In other cases, some kind of

numerical integration In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral. The term numerical quadrature (often abbreviated to quadrature) is more or less a synonym for "numerical integr ...

method is needed, either a general method such as Gaussian integration or a

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...

, or a method specialized to statistical problems such as the Laplace approximation, Gibbs/

Metropolis A metropolis () is a large city or conurbation which is a significant economic, political, and cultural area for a country or region, and an important hub for regional or international connections, commerce, and communications. A big city b ...

sampling, or the EM algorithm. It is also possible to apply the above considerations to a single random variable (data point)

x

, rather than a set of observations. In a Bayesian context, this is equivalent to the prior predictive distribution of a data point.

Applications

Bayesian model comparison

In Bayesian model comparison, the marginalized variables

\theta

are parameters for a particular type of model, and the remaining variable

M

is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing

\theta

for the model parameters, the marginal likelihood for the model ''M'' is :

p(\mathbf\mid M) = \int p(\mathbf\mid\theta, M) \, p(\theta\mid M) \, \operatorname\!\theta

It is in this context that the term ''model evidence'' is normally used. This quantity is important because the posterior odds ratio for a model ''M''₁ against another model ''M''₂ involves a ratio of marginal likelihoods, called the

Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such as a null hypothesis ...

: :

\frac = \frac \, \frac

which can be stated schematically as :posterior

odds In probability theory, odds provide a measure of the probability of a particular outcome. Odds are commonly used in gambling and statistics. For example for an event that is 40% probable, one could say that the odds are or When gambling, o ...

= prior odds ×

Concept

Applications

Bayesian model comparison

See also

References

Further reading