HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form :\int_a^b e^ \, dx, where f(x) is a twice- differentiable function, ''M'' is a large number, and the endpoints ''a'' and ''b'' could possibly be infinite. This technique was originally presented in . In Bayesian statistics,
Laplace's approximation In mathematics, Laplace's approximation fits an un-normalised Gaussian approximation to a (twice differentiable) un-normalised target density. In Bayesian statistical inference this is useful to simultaneously approximate the posterior and the ...
can refer to either approximating the posterior normalizing constant with Laplace's method or approximating the posterior distribution with a Gaussian centered at the
maximum a posteriori estimate In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the b ...
. Laplace approximations play a central role in the
integrated nested Laplace approximations Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternativ ...
method for fast approximate Bayesian inference.


The idea of Laplace's method

Suppose the function f(x) has a unique global maximum at ''x''0. Let M>0 be a constant and consider the following two functions: :\begin g(x) &= Mf(x) \\ h(x) &= e^ \end Note that ''x''0 will be the global maximum of g and h as well. Now observe: :\begin \frac &= \frac = \frac \\ pt\frac &= \frac = e^ \end As ''M'' increases, the ratio for h will grow exponentially, while the ratio for g does not change. Thus, significant contributions to the integral of this function will come only from points ''x'' in a neighbourhood of ''x''0, which can then be estimated.


General theory of Laplace's method

To state and motivate the method, we need several assumptions. We will assume that ''x''0 is not an endpoint of the interval of integration, that the values f(x) cannot be very close to f(x_0) unless ''x'' is close to ''x''0, and that f''(x_0)<0. We can expand f(x) around ''x''0 by
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
, :f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac f''(x_0)(x-x_0)^2 + R where R = O\left((x-x_0)^3\right) (see:
big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
). Since f has a global maximum at ''x''0, and since ''x''0 is not an endpoint, it is a stationary point, so the derivative of f vanishes at ''x''0. Therefore, the function f(x) may be approximated to quadratic order :f(x) \approx f(x_0) - \frac \left, f''(x_0)\ (x-x_0)^2 for ''x'' close to ''x''0 (recall f''(x_0)<0). The assumptions ensure the accuracy of the approximation :\int_a^b e^\, dx\approx e^\int_a^b e^ \, dx (see the picture on the right). This latter integral is a Gaussian integral if the limits of integration go from −∞ to +∞ (which can be assumed because the exponential decays very fast away from ''x''0), and thus it can be calculated. We find :\int_a^b e^\, dx\approx \sqrte^ \text M\to\infty. A generalization of this method and extension to arbitrary precision is provided by .


Formal statement and proof

Suppose f(x) is a twice continuously differentiable function on ,b and there exists a unique point x_0 \in (a,b) such that: :f(x_0) = \max_ f(x) \quad \text \quad f''(x_0)<0. Then: :\lim_ \frac= 1. Lower bound: Let \varepsilon > 0. Since f'' is continuous there exists \delta >0 such that if , x_0-c, < \delta then f''(c) \ge f''(x_0) - \varepsilon. By
Taylor's Theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
, for any x \in (x_0 - \delta, x_0 + \delta), :f(x) \ge f(x_0) + \frac(f''(x_0) - \varepsilon)(x-x_0)^2. Then we have the following lower bound: :\begin \int_a^b e^ \, dx &\ge \int_^ e^ \, dx \\ &\ge e^ \int_^ e^ \, dx \\ &= e^ \sqrt \int_^ e^ \, dy \end where the last equality was obtained by a change of variables :y= \sqrt (x-x_0). Remember f''(x_0)<0 so we can take the square root of its negation. If we divide both sides of the above inequality by :e^\sqrt and take the limit we get: :\lim_ \frac \ge \lim_ \frac \int_^ e^ \, dy \, \cdot \sqrt = \sqrt since this is true for arbitrary \varepsilon we get the lower bound: :\lim_ \frac \ge 1 Note that this proof works also when a = -\infty or b= \infty (or both). Upper bound: The proof is similar to that of the lower bound but there are a few inconveniences. Again we start by picking an \varepsilon >0 but in order for the proof to work we need \varepsilon small enough so that f''(x_0) + \varepsilon < 0. Then, as above, by continuity of f'' and
Taylor's Theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
we can find \delta>0 so that if , x-x_0, < \delta, then :f(x) \le f(x_0) + \frac (f''(x_0) + \varepsilon)(x-x_0)^2. Lastly, by our assumptions (assuming a,b are finite) there exists an \eta >0 such that if , x-x_0, \ge \delta, then f(x) \le f(x_0) - \eta. Then we can calculate the following upper bound: :\begin \int_a^b e^ \, dx &\le \int_a^ e^ \, dx + \int_^ e^ \, dx + \int_^b e^ \, dx \\ &\le (b-a)e^ + \int_^ e^ \, dx \\ &\le (b-a)e^ + e^ \int_^ e^ \, dx\\ &\le (b-a)e^ + e^ \int_^ e^ \, dx \\ &\le (b-a)e^ + e^ \sqrt \end If we divide both sides of the above inequality by :e^\sqrt and take the limit we get: :\lim_ \frac \le \lim_ (b-a) e^ \sqrt + \sqrt = \sqrt Since \varepsilon is arbitrary we get the upper bound: :\lim_ \frac \le 1 And combining this with the lower bound gives the result. Note that the above proof obviously fails when a = -\infty or b = \infty (or both). To deal with these cases, we need some extra assumptions. A sufficient (not necessary) assumption is that for n = 1, :\int_a^b e^ \, dx < \infty, and that the number \eta as above exists (note that this must be an assumption in the case when the interval ,b/math> is infinite). The proof proceeds otherwise as above, but with a slightly different approximation of integrals: :\int_a^ e^ \, dx + \int_^b e^ \, dx \le \int_a^b e^e^ \, dx = e^ \int_a^b e^ \, dx. When we divide by :e^\sqrt, we get for this term :\frac = e^ \sqrt e^ \int_a^b e^ \, dx \sqrt whose limit as n \to \infty is 0. The rest of the proof (the analysis of the interesting term) proceeds as above. The given condition in the infinite interval case is, as said above, sufficient but not necessary. However, the condition is fulfilled in many, if not in most, applications: the condition simply says that the integral we are studying must be well-defined (not infinite) and that the maximum of the function at x_0 must be a "true" maximum (the number \eta > 0 must exist). There is no need to demand that the integral is finite for n=1 but it is enough to demand that the integral is finite for some n=N. This method relies on 4 basic concepts such as :1. Relative error The “approximation” in this method is related to the relative error and not the
absolute error The approximation error in a data value is the discrepancy between an exact value and some ''approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute er ...
. Therefore, if we set :s = \sqrt. the integral can be written as :\begin \int_a^b e^ \, dx &= se^ \frac\int_a^b e^\, dx \\ & = se^ \int_^ e^\,dy \end where s is a small number when M is a large number obviously and the relative error will be :\left, \int_^ e^ dy-1 \. Now, let us separate this integral into two parts: y\in
D_y,D_y D, or d, is the fourth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''dee'' (pronounced ), plural ''dees''. History The ...
/math> region and the rest. :2. e^ \to e^ around the stationary point when M is large enough Let’s look at the Taylor expansion of M(f(x)-f(x_0)) around ''x''0 and translate ''x'' to ''y'' because we do the comparison in y-space, we will get :M(f(x)-f(x_0)) = \fracs^2y^2 +\fracs^3y^3+ \cdots = -\pi y^2 +O\left(\frac\right). Note that f'(x_0)=0 because x_0 is a stationary point. From this equation you will find that the terms higher than second derivative in this Taylor expansion is suppressed as the order of \tfrac so that \exp(M(f(x)-f(x_0))) will get closer to the Gaussian function as shown in figure. Besides, :\int_^e^ dy =1. :3. The larger M is, the smaller range of x is related Because we do the comparison in y-space, y is fixed in y\in
D_y,D_y D, or d, is the fourth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''dee'' (pronounced ), plural ''dees''. History The ...
/math> which will cause x\in sD_y, sD_y/math>; however, s is inversely proportional to \sqrt, the chosen region of x will be smaller when M is increased. :4. If the integral in Laplace’s method converges, the contribution of the region which is not around the stationary point of the integration of its relative error will tend to zero as M grows. Relying on the 3rd concept, even if we choose a very large ''Dy'', ''sDy'' will finally be a very small number when M is increased to a huge number. Then, how can we guarantee the integral of the rest will tend to 0 when M is large enough? The basic idea is to find a function m(x) such that m(x)\ge f(x) and the integral of e^ will tend to zero when M grows. Because the exponential function of Mm(x) will be always larger than zero as long as m(x) is a real number, and this exponential function is proportional to m(x), the integral of e^ will tend to zero. For simplicity, choose m(x) as a tangent through the point x=sD_y as shown in the figure: If the interval of the integration of this method is finite, we will find that no matter f(x) is continue in the rest region, it will be always smaller than m(x) shown above when M is large enough. By the way, it will be proved later that the integral of e^ will tend to zero when M is large enough. If the interval of the integration of this method is infinite, m(x) and f(x) might always cross to each other. If so, we cannot guarantee that the integral of e^ will tend to zero finally. For example, in the case of f(x)=\tfrac, \int^_e^ dx will always diverge. Therefore, we need to require that \int^_e^ dx can converge for the infinite interval case. If so, this integral will tend to zero when d is large enough and we can choose this d as the cross of m(x) and f(x). You might ask that why not choose \int^_e^ dx as a convergent integral? Let me use an example to show you the reason. Suppose the rest part of f(x) is -\ln x, then e^=\tfrac and its integral will diverge; however, when M=2, the integral of e^=\tfrac converges. So, the integral of some functions will diverge when M is not a large number, but they will converge when M is large enough. Based on these four concepts, we can derive the relative error of this Laplace's method.


Other formulations

Laplace's approximation is sometimes written as :\int_a^b h(x) e^\, dx \approx \sqrt h(x_0) e^ \ \text M\to\infty where h is positive. Importantly, the accuracy of the approximation depends on the variable of integration, that is, on what stays in g(x) and what goes into h(x). First, use x_0=0 to denote the global maximum, which will simplify this derivation. We are interested in the relative error, written as , R, , :\int_a^b h(x) e^\, dx = h(0)e^s \underbrace_, where :s\equiv\sqrt. So, if we let :A\equiv \frace^ and A_0\equiv e^, we can get :\left, R\ = \left, \int_^A\,dy -\int_^A_0\,dy \ since \int_^A_0\,dy =1. For the upper bound, note that , A+B, \le , A, +, B, , thus we can separate this integration into 5 parts with 3 different types (a), (b) and (c), respectively. Therefore, :, R, < \underbrace_ + \underbrace_+ \underbrace_ + \underbrace_ + \underbrace_ where (a_1) and (a_2) are similar, let us just calculate (a_1) and (b_1) and (b_2) are similar, too, I’ll just calculate (b_1). For (a_1), after the translation of z\equiv\pi y^2, we can get :(a_1) = \left, \frac\int_^ e^z^ dz\ <\frac. This means that as long as D_y is large enough, it will tend to zero. For (b_1), we can get :(b_1)\le\left, \int_^\left frac\right e^dy \ where :m(x) \ge g(x)-g(0) \text x\in
D_y,b D, or d, is the fourth Letter (alphabet), letter in the Latin alphabet, used in the English alphabet, modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is English alphabet#Le ...
/math> and h(x) should have the same sign of h(0) during this region. Let us choose m(x) as the tangent across the point at x=sD_y , i.e. m(sy)= g(sD_y)-g(0) +g'(sD_y)\left( sy-sD_y \right) which is shown in the figure From this figure you can find that when s or D_y gets smaller, the region satisfies the above inequality will get larger. Therefore, if we want to find a suitable m(x) to cover the whole f(x) during the interval of (b_1), D_y will have an upper limit. Besides, because the integration of e^ is simple, let me use it to estimate the relative error contributed by this (b_1). Based on Taylor expansion, we can get :\begin M\left (sD_y)-g(0)\right&= M\left \fracs^2D_y^2 +\fracs^3D_y^3 \right&& \text \xi\in ,sD_y\\ & = -\pi D_y^2 +\frac, \end and :\begin Msg'(sD_y) &= Ms\left(g''(0)sD_y +\fracs^2D_y^2\right) && \text \zeta\in ,sD_y\\ &= -2\pi D_y +\sqrt\left( \frac \right)^g(\zeta)D_y^2, \end and then substitute them back into the calculation of (b_1); however, you can find that the remainders of these two expansions are both inversely proportional to the square root of M, let me drop them out to beautify the calculation. Keeping them is better, but it will make the formula uglier. :\begin (b_1) &\le \left, \left \frac \right e^\int_0^e^ dy \ \\ &\le \left, \left \frac \right e^\frac \. \end Therefore, it will tend to zero when D_y gets larger, but don't forget that the upper bound of D_y should be considered during this calculation. About the integration near x=0, we can also use
Taylor's Theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
to calculate it. When h'(0) \ne 0 :\begin (c) &\le \int_^ e^ \left, \fracy \\, dy \\ &< \sqrt \left, \frac \_\max \left( 1-e^ \right) \end and you can find that it is inversely proportional to the square root of M. In fact, (c) will have the same behave when h(x) is a constant. Conclusively, the integral near the stationary point will get smaller as \sqrt gets larger, and the rest parts will tend to zero as long as D_y is large enough; however, we need to remember that D_y has an upper limit which is decided by whether the function m(x) is always larger than g(x)-g(0) in the rest region. However, as long as we can find one m(x) satisfying this condition, the upper bound of D_y can be chosen as directly proportional to \sqrt since m(x) is a tangent across the point of g(x)-g(0) at x=sD_y. So, the bigger M is, the bigger D_y can be. In the multivariate case where \mathbf is a d-dimensional vector and f(\mathbf) is a scalar function of \mathbf, Laplace's approximation is usually written as: :\int h(\mathbf)e^\, d\mathbf \approx \left(\frac\right)^ \frac \text M\to\infty where H(f)(\mathbf_0) is the Hessian matrix of f evaluated at \mathbf_0 and where , \cdot, denotes
matrix determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if an ...
. Analogously to the univariate case, the Hessian is required to be negative definite. By the way, although \mathbf denotes a d-dimensional vector, the term d\mathbf denotes an
infinitesimal In mathematics, an infinitesimal number is a quantity that is closer to zero than any standard real number, but that is not zero. The word ''infinitesimal'' comes from a 17th-century Modern Latin coinage ''infinitesimus'', which originally referr ...
volume here, i.e. d\mathbf := dx_1dx_2\cdots dx_d.


Laplace's method extension: Steepest descent

In extensions of Laplace's method,
complex analysis Complex analysis, traditionally known as the theory of functions of a complex variable, is the branch of mathematical analysis that investigates Function (mathematics), functions of complex numbers. It is helpful in many branches of mathemati ...
, and in particular Cauchy's integral formula, is used to find a contour ''of steepest descent'' for an (asymptotically with large ''M'') equivalent integral, expressed as a line integral. In particular, if no point ''x''0 where the derivative of f vanishes exists on the real line, it may be necessary to deform the integration contour to an optimal one, where the above analysis will be possible. Again the main idea is to reduce, at least asymptotically, the calculation of the given integral to that of a simpler integral that can be explicitly evaluated. See the book of Erdelyi (1956) for a simple discussion (where the method is termed ''steepest descents''). The appropriate formulation for the complex ''z''-plane is :\int_a^b e^\, dz \approx \sqrte^ \text M\to\infty. for a path passing through the saddle point at ''z''0. Note the explicit appearance of a minus sign to indicate the direction of the second derivative: one must ''not'' take the modulus. Also note that if the integrand is meromorphic, one may have to add residues corresponding to poles traversed while deforming the contour (see for example section 3 of Okounkov's paper ''Symmetric functions and random partitions'').


Further generalizations

An extension of the steepest descent method is the so-called ''nonlinear stationary phase/steepest descent method''. Here, instead of integrals, one needs to evaluate asymptotically solutions of Riemann–Hilbert factorization problems. Given a contour ''C'' in the
complex sphere In mathematics, the Riemann sphere, named after Bernhard Riemann, is a model of the extended complex plane: the complex plane plus one point at infinity. This extended plane represents the extended complex numbers, that is, the complex numbers pl ...
, a function f defined on that contour and a special point, say infinity, one seeks a function ''M'' holomorphic away from the contour ''C'', with prescribed jump across ''C'', and with a given normalization at infinity. If f and hence ''M'' are matrices rather than scalars this is a problem that in general does not admit an explicit solution. An asymptotic evaluation is then possible along the lines of the linear stationary phase/steepest descent method. The idea is to reduce asymptotically the solution of the given Riemann–Hilbert problem to that of a simpler, explicitly solvable, Riemann–Hilbert problem. Cauchy's theorem is used to justify deformations of the jump contour. The nonlinear stationary phase was introduced by Deift and Zhou in 1993, based on earlier work of Its. A (properly speaking) nonlinear steepest descent method was introduced by Kamvissis, K. McLaughlin and P. Miller in 2003, based on previous work of Lax, Levermore, Deift, Venakides and Zhou. As in the linear case, "steepest descent contours" solve a min-max problem. In the nonlinear case they turn out to be "S-curves" (defined in a different context back in the 80s by Stahl, Gonchar and Rakhmanov). The nonlinear stationary phase/steepest descent method has applications to the theory of soliton equations and integrable models, random matrices and
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many appl ...
.


Laplace's method generalization: Median-point approximation

In the generalization, evaluation of the integral is considered equivalent to finding the norm of the distribution with density :e^. Denoting the cumulative distribution F(x), if there is a diffeomorphic Gaussian distribution with density :e^ the norm is given by :\sqrte^ and the corresponding diffeomorphism is :y(x)=\frac\Phi^, where \Phi denotes cumulative standard normal distribution function. In general, any distribution diffeomorphic to the Gaussian distribution has density :e^y'(x) and the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
-point is mapped to the median of the Gaussian distribution. Matching the logarithm of the density functions and their derivatives at the median point up to a given order yields a system of equations that determine the approximate values of \gamma and g. The approximation was introduced in 2019 by D. Makogon and C. Morais Smith primarily in the context of partition function evaluation for a system of interacting fermions.


Complex integrals

For complex integrals in the form: :\frac\int_^ g(s)e^ \,ds with t \gg 1, we make the substitution ''t'' = ''iu'' and the change of variable s=c+ix to get the bilateral Laplace transform: :\frac\int_^\infty g(c+ix)e^e^ \, dx. We then split ''g''(''c'' + ''ix'') in its real and complex part, after which we recover ''u'' = ''t''/''i''. This is useful for inverse Laplace transforms, the
Perron formula In mathematics, and more particularly in analytic number theory, Perron's formula is a formula due to Oskar Perron to calculate the sum of an arithmetic function, by means of an inverse Mellin transform. Statement Let \ be an arithmetic function, a ...
and complex integration.


Example: Stirling's approximation

Laplace's method can be used to derive Stirling's approximation :N!\approx \sqrt N^N e^\, for a large integer ''N''. From the definition of the
Gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
, we have :N! = \Gamma(N+1)=\int_0^\infty e^ x^N \, dx. Now we change variables, letting x=Nz so that dx = Ndz. Plug these values back in to obtain :\begin N! &= \int_0^\infty e^ (Nz)^N N \, dz \\ &= N^ \int_0^\infty e^ z^N \, dz \\ &= N^ \int_0^\infty e^ e^ \, dz \\ &= N^ \int_0^\infty e^ \, dz. \end This integral has the form necessary for Laplace's method with :f(z) = \ln-z which is twice-differentiable: :f'(z) = \frac-1, :f''(z) = -\frac. The maximum of f(z) lies at ''z''0 = 1, and the second derivative of f(z) has the value −1 at this point. Therefore, we obtain :N! \approx N^\sqrt e^=\sqrt N^N e^.


See also

*
Method of stationary phase In mathematics, the stationary phase approximation is a basic principle of asymptotic analysis, applying to the limit as k \to \infty . This method originates from the 19th century, and is due to George Gabriel Stokes and Lord Kelvin. It is clos ...
* Method of steepest descent * Large deviations theory * Laplace principle (large deviations theory) *
Laplace's approximation In mathematics, Laplace's approximation fits an un-normalised Gaussian approximation to a (twice differentiable) un-normalised target density. In Bayesian statistical inference this is useful to simultaneously approximate the posterior and the ...


Notes


References

*. *. *. *. * * {{Integrals Asymptotic analysis Perturbation theory