HOME

TheInfoList



OR:

Integrated nested Laplace approximations (INLA) is a method for approximate
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
based on
Laplace's method In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form :\int_a^b e^ \, dx, where f(x) is a twice- differentiable function, ''M'' is a large number, and the endpoints ''a'' ...
. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions. Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology. It is also possible to combine INLA with a finite element method solution of a
stochastic partial differential equation Stochastic partial differential equations (SPDEs) generalize partial differential equations via random force terms and coefficients, in the same way ordinary stochastic differential equations generalize ordinary differential equations. They have ...
to study e.g. spatial point processes and species distribution models. The INLA method is implemented in the R-INLA R package.


Latent Gaussian models

Let \boldsymbol=(y_1,\dots,y_n) denote the response variable (that is, the observations) which belongs to an exponential family, with the mean \mu_i (of y_i) being linked to a linear predictor \eta_i via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector \boldsymbol. The
hyperparameters In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to mo ...
of the model are denoted by \boldsymbol. As per Bayesian statistics, \boldsymbol and \boldsymbol are random variables with prior distributions. The observations are assumed to be conditionally independent given \boldsymbol and \boldsymbol: \pi(\boldsymbol , \boldsymbol, \boldsymbol) = \prod_\pi(y_i , \eta_i, \boldsymbol), where \mathcal is the set of indices for observed elements of \boldsymbol (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor \boldsymbol is part of \boldsymbol. For the model to be a latent Gaussian model, it is assumed that \boldsymbol, \boldsymbol is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density \pi(\boldsymbol , \boldsymbol) \propto \left, \boldsymbol \^ \exp \left( -\frac \boldsymbol^T \boldsymbol \boldsymbol \right), where \boldsymbol is a \boldsymbol-dependent sparse precision matrix and \left, \boldsymbol \ is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution \pi(\boldsymbol) for the hyperparameters need not be Gaussian. However, the number of hyperparameters, m=\mathrm(\boldsymbol), is assumed to be small (say, less than 15).


Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables \boldsymbol and \boldsymbol. Applying
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
\pi(\boldsymbol, \boldsymbol , \boldsymbol) = \frac, the joint posterior distribution of \boldsymbol and \boldsymbol is given by \begin \pi(\boldsymbol, \boldsymbol , \boldsymbol) & \propto \pi(\boldsymbol)\pi(\boldsymbol, \boldsymbol) \prod_i \pi(y_i , \eta_i, \boldsymbol) \\ & \propto \pi(\boldsymbol) \left, \boldsymbol \^ \exp \left( -\frac \boldsymbol^T \boldsymbol \boldsymbol + \sum_i \log \left \eta_i, \boldsymbol) \right\right). \end Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals \begin \pi(x_i , \boldsymbol) &=& \int \pi(x_i , \boldsymbol, \boldsymbol) \pi(\boldsymbol , \boldsymbol) d\boldsymbol \\ \pi(\theta_j , \boldsymbol) &=& \int \pi(\boldsymbol , \boldsymbol) d \boldsymbol_ , \end where \boldsymbol_ = \left(\theta_1, \dots, \theta_, \theta_, \dots, \theta_m \right). A key idea of INLA is to construct nested approximations given by \begin \widetilde(x_i , \boldsymbol) &=& \int \widetilde(x_i , \boldsymbol, \boldsymbol) \widetilde(\boldsymbol , \boldsymbol) d\boldsymbol \\ \widetilde(\theta_j , \boldsymbol) &=& \int \widetilde(\boldsymbol , \boldsymbol) d \boldsymbol_ , \end where \widetilde(\cdot , \cdot) is an approximated posterior density. The approximation to the marginal density \pi(x_i , \boldsymbol) is obtained in a nested fashion by first approximating \pi(\boldsymbol , \boldsymbol) and \pi(x_i , \boldsymbol, \boldsymbol), and then numerically integrating out \boldsymbol as \begin \widetilde(x_i , \boldsymbol) = \sum_k \widetilde\left( x_i , \boldsymbol_k, \boldsymbol \right) \times \widetilde( \boldsymbol_k , \boldsymbol) \times \Delta_k, \end where the summation is over the values of \boldsymbol, with integration weights given by \Delta_k. The approximation of \pi(\theta_j , \boldsymbol) is computed by numerically integrating \boldsymbol_ out from \widetilde(\boldsymbol , \boldsymbol). To get the approximate distribution \widetilde(\boldsymbol , \boldsymbol), one can use the relation \begin ( \boldsymbol , \boldsymbol) = \frac, \end as the starting point. Then \widetilde( \boldsymbol , \boldsymbol) is obtained at a specific value of the hyperparameters \boldsymbol = \boldsymbol_k with the Laplace approximation \begin \widetilde( \boldsymbol_k , \boldsymbol) &\propto \left . \frac \right \vert_, \\ & \propto \left . \frac \right \vert_, \end where \widetilde_G\left(\boldsymbol , \boldsymbol_k, \boldsymbol \right) is the Gaussian approximation to \left(\boldsymbol , \boldsymbol_k, \boldsymbol \right) whose mode at a given \boldsymbol_k is \boldsymbol^(\boldsymbol_k). The mode can be found numerically for example with the Newton-Raphson method. The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of \boldsymbol in the denominator since it is usually close to a Gaussian due to the GMRF property of \boldsymbol. Applying the approximation here improves the accuracy of the method, since the posterior ( \boldsymbol , \boldsymbol) itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on ( \boldsymbol , \boldsymbol). The second important property of a GMRF, the sparsity of the precision matrix \boldsymbol_, is required for efficient computation of \widetilde( \boldsymbol_k , \boldsymbol) for each value . Obtaining the approximate distribution \widetilde\left( x_i , \boldsymbol_k, \boldsymbol \right) is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation. For the numerical integration to obtain \widetilde(x_i , \boldsymbol), also three options are available: grid search, central composite design, or empirical Bayes.


References


Further reading

* {{cite book , first=Virgilio , last=Gomez-Rubio , title=Bayesian inference with INLA , location= , publisher=Chapman and Hall/CRC , year=2021 , isbn=978-1-03-217453-2 Computational statistics Bayesian inference