Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the

Jacobian In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to: *Jacobian matrix and determinant *Jacobian elliptic functions *Jacobian variety *Intermediate Jacobian In mathematics, the intermediate Jacobian of a compact Kähler m ...

or Hessian is unavailable or is too expensive to compute at every iteration. The "full"

Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...

requires the Jacobian in order to search for zeros, or the Hessian for finding extrema.

Search for zeros: root finding

to find zeroes of a function

g

of multiple variables is given by

x_ = x_n -

_g(x_n) G, or g, is the seventh letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''gee'' (pronounced ), plural ''gees''. History The ...

g(x_n), where

is the left inverse of the

Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables ...

J_g(x_n)

g

evaluated for

x_n

. Strictly speaking, any method that replaces the exact Jacobian

J_g(x_n)

with an approximation is a quasi-Newton method. For instance, the chord method (where

J_g(x_n)

is replaced by

J_g(x_0)

for all iterations) is a simple example. The methods given below for

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

refer to an important subclass of quasi-Newton methods, secant methods. Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes. Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are the column-updating method, the

inverse column-updating method Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse (negation), the inverse of a number that, when a ...

, the

quasi-Newton least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration ...

and the

quasi-Newton inverse least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...

. More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.

Search for extrema: optimization

The search for a minimum or maximum of a scalar-valued function is nothing else than the search for the zeroes of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...

of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if

g

is the gradient of

f

, then searching for the zeroes of the vector-valued function

g

corresponds to the search for the extrema of the scalar-valued function

f

; the Jacobian of

g

now becomes the Hessian of

f

. The main difference is that the Hessian matrix is a symmetric matrix, unlike the Jacobian when searching for zeroes. Most quasi-Newton methods used in optimization exploit this property. In

, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local

maxima and minima In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...

of functions. Quasi-Newton methods are based on

to find the

stationary point In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" i ...

of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a

quadratic In mathematics, the term quadratic describes something that pertains to squares, to the operation of squaring, to terms of the second degree, or equations or formulas that involve such terms. ''Quadratus'' is Latin for ''square''. Mathematics ...

in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the

Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...

of second

derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...

s of the function to be minimized. In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the

secant method In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite-difference approximation ...

to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian. The first quasi-Newton algorithm was proposed by

William C. Davidon William Cooper Davidon (March 18, 1927 – November 8, 2013) was an American professor of physics and mathematics, and a peace activist. As the mastermind of the March 8, 1971, FBI office break-in, in Media, Pennsylvania, Davidon was the informal ...

, a physicist working at

Argonne National Laboratory Argonne National Laboratory is a science and engineering research national laboratory operated by UChicago Argonne LLC for the United States Department of Energy. The facility is located in Lemont, Illinois, outside of Chicago, and is the lar ...

. He developed the first quasi-Newton algorithm in 1959: the

DFP updating formula DFP may stand for: * Data Facility Product, an IBM program product for MVS, and later a component of Data Facility Storage Management Subsystem for MVS * Davidon-Fletcher-Powell formula in mathematical optimization * Decimal floating point * D ...

, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the

SR1 formula The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This upda ...

(for "symmetric rank-one"), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension

L-BFGS Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...

. The Broyden's class is a linear combination of the DFP and BFGS methods. The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the

(rather than the Hessian). One of the chief advantages of quasi-Newton methods over

is that the

(or, in the case of quasi-Newton methods, its approximation)

B

does not need to be inverted. Newton's method, and its derivatives such as

interior point method Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems. An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...

s, require the Hessian to be inverted, which is typically implemented by solving a system of linear equations and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of

B^

directly. As in

, one uses a second-order approximation to find the minimum of a function

f(x)

. The

Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...

f(x)

around an iterate is :

f(x_k + \Delta x) \approx f(x_k) + \nabla f(x_k)^ \,\Delta x + \frac \Delta x^ B \,\Delta x,

where (

\nabla f

) is the

, and

B

an approximation to the

. The gradient of this approximation (with respect to

\Delta x

) is :

\nabla f(x_k + \Delta x) \approx \nabla f(x_k) + B \,\Delta x,

and setting this gradient to zero (which is the goal of optimization) provides the Newton step: :

\Delta x = -B^ \nabla f(x_k).

The Hessian approximation

B

is chosen to satisfy :

\nabla f(x_k + \Delta x) = \nabla f(x_k) + B \,\Delta x,

which is called the ''secant equation'' (the Taylor series of the gradient itself). In more than one dimension

B

is underdetermined. In one dimension, solving for

B

and applying the Newton's step with the updated value is equivalent to the

. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as Broyden's method) seek a symmetric solution (

B^T = B

); furthermore, the variants listed below can be motivated by finding an update

B_

that is as close as possible to

B_

in some norm; that is,

B_ = \operatorname_B \, B - B_k\, _V

, where

V

is some

positive-definite matrix In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, ...

that defines the norm. An approximate initial value

B_0 = \beta I

is often sufficient to achieve rapid convergence, although there is no general strategy to choose

\beta

. Note that

B_0

should be positive-definite. The unknown

x_k

is updated applying the Newton's step calculated using the current approximate Hessian matrix

B_

: *

\Delta x_k = -\alpha_k B_k^ \nabla f(x_k)

, with

\alpha

chosen to satisfy the

Wolfe conditions In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969. In these methods the idea is to find ::\mi ...

; *

x_ = x_k + \Delta x_k

; * The gradient computed at the new point

\nabla f(x_)

, and ::

y_k = \nabla f(x_) - \nabla f(x_k)

is used to update the approximate Hessian

B_

, or directly its inverse

H_ = B_^

using the Sherman–Morrison formula. * A key property of the BFGS and DFP updates is that if

B_k

is positive-definite, and

\alpha_k

is chosen to satisfy the Wolfe conditions, then

B_

is also positive-definite. The most popular update formulas are: : Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method.

Relationship to matrix inversion

When

f

is a convex quadratic function with positive-definite Hessian

B

, one would expect the matrices

H_k

generated by a quasi-Newton method to converge to the inverse Hessian

H = B^

. This is indeed the case for the class of quasi-Newton methods based on least-change updates.

Notable implementations

Implementations of quasi-Newton methods are available in many programming languages. Notable open source implementations include: *

GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lang ...

uses a form of BFGS in its fsolve function, with trust region extensions. * GNU Scientific Library implements the Broyden-Fletcher-Goldfarb-Shanno ( BFGS) algorithm. *

ALGLIB ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages ( C++, C#, VB.NET, Python, Delphi). ALGLIB started in 1999 and has a long history of steady development wit ...

implements (L)BFGS in C++ and C# * R's optim general-purpose optimizer routine uses the BFGS method by using method="BFGS". * Scipy.optimize has fmin_bfgs. In the SciPy extension to Python, the scipy.optimize.minimize function includes, among other methods, a BFGS implementation. Notable proprietary implementations include: *

Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimi ...

includes quasi-Newton solvers. * The NAG Library contains several routines for minimizing or maximizing a function which use quasi-Newton algorithms. * In MATLAB's

Optimization Toolbox Optimization Toolbox is an optimization software package developed by MathWorks. It is an add-on product to MATLAB, and provides a library of solvers that can be used from the MATLAB environment. The toolbox was first released for MATLAB in 199 ...

, the fminunc function uses (among other methods) the BFGS quasi-Newton method. Many of the constrained methods of the Optimization toolbox use BFGS and the variant

Search for zeros: root finding

Search for extrema: optimization

Relationship to matrix inversion

Notable implementations

See also

References

Further reading