Integrated nested Laplace approximations (INLA) is a method for approximate

Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...

based on

Laplace's method In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form :\int_a^b e^ \, dx, where f is a twice-differentiable function, M is a large number, and the endpoints a and b could b ...

. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...

methods to compute posterior marginal distributions. Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular

spatial statistics Spatial statistics is a field of applied statistics dealing with spatial data. It involves stochastic processes (random fields, point processes), sampling, smoothing and interpolation, regional ( areal unit) and lattice ( gridded) data, poin ...

ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...

, and

epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...

. It is also possible to combine INLA with a

finite element method Finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat tran ...

solution of a

stochastic partial differential equation Stochastic partial differential equations (SPDEs) generalize partial differential equations via random force terms and coefficients, in the same way ordinary stochastic differential equations generalize ordinary differential equations. They hav ...

to study e.g. spatial point processes and species distribution models. The INLA method is implemented in the R-INLA R package.

Latent Gaussian models

Let

\boldsymbol=(y_1,\dots,y_n)

denote the response variable (that is, the observations) which belongs to an

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

, with the mean

\mu_i

(of

y_i

) being linked to a linear predictor

\eta_i

via an appropriate

link function In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...

. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector

\boldsymbol

. The hyperparameters of the model are denoted by

\boldsymbol

. As per Bayesian statistics,

\boldsymbol

and

\boldsymbol

are random variables with prior distributions. The observations are assumed to be conditionally independent given

\boldsymbol

and

\boldsymbol

\pi(\boldsymbol , \boldsymbol, \boldsymbol) = \prod_\pi(y_i ,  \eta_i, \boldsymbol),

where

\mathcal

is the set of indices for observed elements of

\boldsymbol

(some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor

\boldsymbol

is part of

\boldsymbol

. For the model to be a latent Gaussian model, it is assumed that

\boldsymbol, \boldsymbol

is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density

\pi(\boldsymbol ,  \boldsymbol) \propto \left,  \boldsymbol \^ \exp \left( -\frac \boldsymbol^T  \boldsymbol \boldsymbol \right),

where

\boldsymbol

is a

\boldsymbol

-dependent sparse

precision matrix In statistics, the precision matrix or concentration matrix is the matrix inverse of the covariance matrix or dispersion matrix, P = \Sigma^. For univariate distributions, the precision matrix degenerates into a scalar precision, defined as the ...

and

\left,  \boldsymbol \

is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution

\pi(\boldsymbol)

for the hyperparameters need not be Gaussian. However, the number of hyperparameters,

m=\mathrm(\boldsymbol)

, is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

of the latent variables

\boldsymbol

and

\boldsymbol

. Applying

Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...

\pi(\boldsymbol, \boldsymbol ,  \boldsymbol) = \frac,

the joint posterior distribution of

\boldsymbol

and

\boldsymbol

is given by

\right). \end

Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals

\begin
\pi(x_i ,  \boldsymbol) &=& \int \pi(x_i ,  \boldsymbol, \boldsymbol) \pi(\boldsymbol ,  \boldsymbol) d\boldsymbol \\
\pi(\theta_j ,  \boldsymbol) &=& \int \pi(\boldsymbol ,  \boldsymbol) d \boldsymbol_ ,
\end

where

\boldsymbol_ = \left(\theta_1, \dots, \theta_, \theta_, \dots, \theta_m \right)

. A key idea of INLA is to construct nested approximations given by

\begin
\widetilde(x_i ,  \boldsymbol) &=& \int \widetilde(x_i ,  \boldsymbol, \boldsymbol) \widetilde(\boldsymbol ,  \boldsymbol) d\boldsymbol \\
\widetilde(\theta_j ,  \boldsymbol) &=& \int \widetilde(\boldsymbol ,  \boldsymbol) d \boldsymbol_ ,
\end

where

\widetilde(\cdot ,  \cdot)

is an approximated posterior density. The approximation to the marginal density

\pi(x_i ,  \boldsymbol)

is obtained in a nested fashion by first approximating

\pi(\boldsymbol ,  \boldsymbol)

and

\pi(x_i ,  \boldsymbol, \boldsymbol)

, and then numerically integrating out

\boldsymbol

\begin
\widetilde(x_i ,  \boldsymbol) = \sum_k \widetilde\left( x_i ,  \boldsymbol_k, \boldsymbol \right) \times \widetilde( \boldsymbol_k ,  \boldsymbol) \times \Delta_k,
\end

where the summation is over the values of

\boldsymbol

, with integration weights given by

\Delta_k

. The approximation of

\pi(\theta_j ,  \boldsymbol)

is computed by numerically integrating

\boldsymbol_

out from

\widetilde(\boldsymbol ,  \boldsymbol)

. To get the approximate distribution

\widetilde(\boldsymbol ,  \boldsymbol)

, one can use the relation

\begin
( \boldsymbol ,  \boldsymbol) = \frac,
\end

as the starting point. Then

\widetilde( \boldsymbol ,  \boldsymbol)

is obtained at a specific value of the hyperparameters

\boldsymbol = \boldsymbol_k

with

Laplace's approximation Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information. The approximat ...

\begin
\widetilde( \boldsymbol_k ,  \boldsymbol) &\propto \left . \frac \right \vert_, \\
& \propto \left . \frac \right \vert_,
\end

where

\widetilde_G\left(\boldsymbol ,   \boldsymbol_k, \boldsymbol \right)

is the Gaussian approximation to

\left(\boldsymbol ,   \boldsymbol_k, \boldsymbol \right)

whose

mode Mode ( meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * MO''D''E (magazine), a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is the setting fo ...

at a given

\boldsymbol_k

\boldsymbol^(\boldsymbol_k)

. The mode can be found numerically for example with the Newton-Raphson method. The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of

\boldsymbol

in the denominator since it is usually close to a Gaussian due to the GMRF property of

\boldsymbol

. Applying the approximation here improves the accuracy of the method, since the posterior

( \boldsymbol ,  \boldsymbol)

itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on

( \boldsymbol ,  \boldsymbol)

. The second important property of a GMRF, the sparsity of the precision matrix

\boldsymbol_

, is required for efficient computation of

\widetilde( \boldsymbol_k ,  \boldsymbol)

for each value

. Obtaining the approximate distribution

\widetilde\left( x_i ,  \boldsymbol_k, \boldsymbol \right)

is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation. For the numerical integration to obtain

\widetilde(x_i ,  \boldsymbol)

, also three options are available: grid search, central composite design, or empirical Bayes.

Latent Gaussian models

Approximate Bayesian inference with INLA

References

Further reading