Quasi-Newton method
   HOME

TheInfoList



OR:

Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the
Jacobian In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to: * Jacobian matrix and determinant * Jacobian elliptic functions * Jacobian variety *Intermediate Jacobian In mathematics, the intermediate Jacobian of a compact Kähle ...
or Hessian is unavailable or is too expensive to compute at every iteration. The "full"
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
requires the Jacobian in order to search for zeros, or the Hessian for finding extrema.


Search for zeros: root finding

Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
to find zeroes of a function g of multiple variables is given by x_ = x_n - _g(x_n) g(x_n), where _g(x_n) is the left inverse of the
Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variable ...
J_g(x_n) of g evaluated for x_n. Strictly speaking, any method that replaces the exact Jacobian J_g(x_n) with an approximation is a quasi-Newton method. For instance, the chord method (where J_g(x_n) is replaced by J_g(x_0) for all iterations) is a simple example. The methods given below for
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
refer to an important subclass of quasi-Newton methods, secant methods. Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes. Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are the column-updating method, the inverse column-updating method, the
quasi-Newton least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...
and the
quasi-Newton inverse least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...
. More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.


Search for extrema: optimization

The search for a minimum or maximum of a scalar-valued function is nothing else than the search for the zeroes of the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if g is the gradient of f, then searching for the zeroes of the vector-valued function g corresponds to the search for the extrema of the scalar-valued function f; the Jacobian of g now becomes the Hessian of f. The main difference is that the Hessian matrix is a symmetric matrix, unlike the Jacobian when searching for zeroes. Most quasi-Newton methods used in optimization exploit this property. In
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local
maxima and minima In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
of functions. Quasi-Newton methods are based on
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
to find the
stationary point In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" in ...
of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the
Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
of second
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...
s of the function to be minimized. In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the
secant method In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite-difference approximation o ...
to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian. The first quasi-Newton algorithm was proposed by William C. Davidon, a physicist working at
Argonne National Laboratory Argonne National Laboratory is a science and engineering research national laboratory operated by UChicago Argonne LLC for the United States Department of Energy. The facility is located in Lemont, Illinois, outside of Chicago, and is the l ...
. He developed the first quasi-Newton algorithm in 1959: the
DFP updating formula DFP may stand for: * Data Facility Product, an IBM program product for MVS, and later a component of Data Facility Storage Management Subsystem for MVS * Davidon-Fletcher-Powell formula in mathematical optimization * Decimal floating point * D ...
, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the
SR1 formula The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This upd ...
(for "symmetric rank-one"), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension L-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods. The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the
Jacobian In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to: * Jacobian matrix and determinant * Jacobian elliptic functions * Jacobian variety *Intermediate Jacobian In mathematics, the intermediate Jacobian of a compact Kähle ...
(rather than the Hessian). One of the chief advantages of quasi-Newton methods over
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
is that the
Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
(or, in the case of quasi-Newton methods, its approximation) B does not need to be inverted. Newton's method, and its derivatives such as
interior point method Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems. An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...
s, require the Hessian to be inverted, which is typically implemented by solving a
system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variables. For example, :\begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of three equations in t ...
and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of B^ directly. As in
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
, one uses a second-order approximation to find the minimum of a function f(x). The
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
of f(x) around an iterate is :f(x_k + \Delta x) \approx f(x_k) + \nabla f(x_k)^ \,\Delta x + \frac \Delta x^ B \,\Delta x, where (\nabla f) is the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
, and B an approximation to the
Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
. The gradient of this approximation (with respect to \Delta x) is :\nabla f(x_k + \Delta x) \approx \nabla f(x_k) + B \,\Delta x, and setting this gradient to zero (which is the goal of optimization) provides the Newton step: :\Delta x = -B^ \nabla f(x_k). The Hessian approximation B is chosen to satisfy :\nabla f(x_k + \Delta x) = \nabla f(x_k) + B \,\Delta x, which is called the ''secant equation'' (the Taylor series of the gradient itself). In more than one dimension B is underdetermined. In one dimension, solving for B and applying the Newton's step with the updated value is equivalent to the
secant method In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite-difference approximation o ...
. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as Broyden's method) seek a symmetric solution (B^T = B); furthermore, the variants listed below can be motivated by finding an update B_ that is as close as possible to B_ in some norm; that is, B_ = \operatorname_B \, B - B_k\, _V, where V is some
positive-definite matrix In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, a ...
that defines the norm. An approximate initial value B_0 = \beta I is often sufficient to achieve rapid convergence, although there is no general strategy to choose \beta . Note that B_0 should be positive-definite. The unknown x_k is updated applying the Newton's step calculated using the current approximate Hessian matrix B_: * \Delta x_k = -\alpha_k B_k^ \nabla f(x_k), with \alpha chosen to satisfy the
Wolfe conditions In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969. In these methods the idea is to find ::\mi ...
; * x_ = x_k + \Delta x_k; * The gradient computed at the new point \nabla f(x_), and ::y_k = \nabla f(x_) - \nabla f(x_k) is used to update the approximate Hessian B_, or directly its inverse H_ = B_^ using the
Sherman–Morrison formula In mathematics, in particular linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of the sum of an invertible matrix A and the outer product, u v^\textsf, of vectors u and v. Th ...
. * A key property of the BFGS and DFP updates is that if B_k is positive-definite, and \alpha_k is chosen to satisfy the Wolfe conditions, then B_ is also positive-definite. The most popular update formulas are: : Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method.


Relationship to matrix inversion

When f is a convex quadratic function with positive-definite Hessian B, one would expect the matrices H_k generated by a quasi-Newton method to converge to the inverse Hessian H = B^. This is indeed the case for the class of quasi-Newton methods based on least-change updates.


Notable implementations

Implementations of quasi-Newton methods are available in many programming languages. Notable open source implementations include: *
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lan ...
uses a form of BFGS in its fsolve function, with
trust region In mathematical optimization, a trust region is the subset of the region of the objective function that is approximated using a model function (often a quadratic). If an adequate model of the objective function is found within the trust region, the ...
extensions. *
GNU Scientific Library The GNU Scientific Library (or GSL) is a software library for numerical computations in applied mathematics and science. The GSL is written in C; wrappers are available for other programming languages. The GSL is part of the GNU Project and is ...
implements the Broyden-Fletcher-Goldfarb-Shanno ( BFGS) algorithm. *
ALGLIB ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages (C++, C#, VB.NET, Python, Delphi). ALGLIB started in 1999 and has a long history of steady development wi ...
implements (L)BFGS in C++ and C# * R's optim general-purpose optimizer routine uses the BFGS method by using method="BFGS". *
Scipy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, ...
.optimize has fmin_bfgs. In the
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, ...
extension to Python, the scipy.optimize.minimize function includes, among other methods, a BFGS implementation. Notable proprietary implementations include: *
Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimiza ...
includes quasi-Newton solvers. * The NAG Library contains several routines for minimizing or maximizing a function which use quasi-Newton algorithms. * In MATLAB's Optimization Toolbox, the fminunc function uses (among other methods) the BFGS quasi-Newton method. Many of the constrained methods of the Optimization toolbox use BFGS and the variant L-BFGS.<


See also

* BFGS method ** L-BFGS ** OWL-QN * Broyden's method *
DFP updating formula DFP may stand for: * Data Facility Product, an IBM program product for MVS, and later a component of Data Facility Storage Management Subsystem for MVS * Davidon-Fletcher-Powell formula in mathematical optimization * Decimal floating point * D ...
*
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real- ...
*
Newton's method in optimization In calculus, Newton's method is an iterative method for finding the roots of a differentiable function , which are solutions to the equation . As such, Newton's method can be applied to the derivative of a twice-differentiable function to fi ...
*
SR1 formula The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This upd ...


References


Further reading

* *. * * * {{Optimization algorithms, unconstrained Optimization algorithms and methods