numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...

, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for

Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...

, except using approximations of the derivatives of the functions in place of exact derivatives. Newton's method requires the

Jacobian matrix In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...

of all

partial derivatives In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Par ...

of a multivariate function when used to search for zeros or the

Hessian matrix In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...

when used for finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration. Some

iterative methods In computational mathematics, an iterative method is a Algorithm, mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''i''-th approximation (called an " ...

that reduce to Newton's method, such as

sequential quadratic programming Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twi ...

, may also be considered quasi-Newton methods.

Search for zeros: root finding

to find zeroes of a function

g

of multiple variables is given by

_g(x_n)

, where

_g(x_n)

is the left inverse of the

J_g(x_n)

g

evaluated for

x_n

. Strictly speaking, any method that replaces the exact Jacobian

J_g(x_n)

with an approximation is a quasi-Newton method. For instance, the chord method (where

J_g(x_n)

is replaced by

J_g(x_0)

for all iterations) is a simple example. The methods given below for

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

refer to an important subclass of quasi-Newton methods, secant methods. Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes. Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are the column-updating method, the inverse column-updating method, the quasi-Newton least squares method and the quasi-Newton inverse least squares method. More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.

Search for extrema: optimization

The search for a minimum or maximum of a scalar-valued function is closely related to the search for the zeroes of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if

g

is the gradient of

f

, then searching for the zeroes of the vector-valued function

g

corresponds to the search for the extrema of the scalar-valued function

f

; the Jacobian of

g

now becomes the Hessian of

f

. The main difference is that the Hessian matrix is a symmetric matrix, unlike the Jacobian when searching for zeroes. Most quasi-Newton methods used in optimization exploit this symmetry. In

, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local

maxima and minima In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...

of functions. Quasi-Newton methods for optimization are based on

to find the stationary points of a function, points where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the

of second

derivative In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...

s of the function to be minimized. In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian. The first quasi-Newton algorithm was proposed by William C. Davidon, a physicist working at

Argonne National Laboratory Argonne National Laboratory is a Federally funded research and development centers, federally funded research and development center in Lemont, Illinois, Lemont, Illinois, United States. Founded in 1946, the laboratory is owned by the United Sta ...

. He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula (for "symmetric rank-one"), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension L-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods. The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian). One of the chief advantages of quasi-Newton methods over

is that the

(or, in the case of quasi-Newton methods, its approximation)

B

does not need to be inverted. Newton's method, and its derivatives such as

interior point method Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...

s, require the Hessian to be inverted, which is typically implemented by solving a

system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of two or more linear equations involving the same variable (math), variables. For example, : \begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of th ...

and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of

B^

directly. As in

, one uses a second-order approximation to find the minimum of a function

f(x)

. The

Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

f(x)

around an iterate is :

f(x_k + \Delta x) \approx f(x_k) + \nabla f(x_k)^ \,\Delta x + \frac \Delta x^ B \,\Delta x,

where (

\nabla f

) is the

, and

B

an approximation to the

. The gradient of this approximation (with respect to

\Delta x

) is :

\nabla f(x_k + \Delta x) \approx \nabla f(x_k) + B \,\Delta x,

and setting this gradient to zero (which is the goal of optimization) provides the Newton step: :

\Delta x = -B^ \nabla f(x_k).

The Hessian approximation

B

is chosen to satisfy :

\nabla f(x_k + \Delta x) = \nabla f(x_k) + B \,\Delta x,

which is called the ''secant equation'' (the Taylor series of the gradient itself). In more than one dimension

B

is underdetermined. In one dimension, solving for

B

and applying the Newton's step with the updated value is equivalent to the secant method. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as Broyden's method) seek a symmetric solution (

B^T = B

); furthermore, the variants listed below can be motivated by finding an update

B_

that is as close as possible to

B_

in some

norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...

; that is,

B_ = \operatorname_B \, B - B_k\, _V

, where

V

is some

positive-definite matrix In mathematics, a symmetric matrix M with real entries is positive-definite if the real number \mathbf^\mathsf M \mathbf is positive for every nonzero real column vector \mathbf, where \mathbf^\mathsf is the row vector transpose of \mathbf. Mo ...

that defines the norm. An approximate initial value

B_0 = \beta I

is often sufficient to achieve rapid convergence, although there is no general strategy to choose

\beta

. Note that

B_0

should be positive-definite. The unknown

x_k

is updated applying the Newton's step calculated using the current approximate Hessian matrix

B_

: *

\Delta x_k = -\alpha_k B_k^ \nabla f(x_k)

, with

\alpha

chosen to satisfy the Wolfe conditions; *

x_ = x_k + \Delta x_k

; * The gradient computed at the new point

\nabla f(x_)

, and ::

y_k = \nabla f(x_) - \nabla f(x_k)

is used to update the approximate Hessian

B_

, or directly its inverse

H_ = B_^

using the Sherman–Morrison formula. * A key property of the BFGS and DFP updates is that if

B_k

is positive-definite, and

\alpha_k

is chosen to satisfy the Wolfe conditions, then

B_

is also positive-definite. The most popular update formulas are: : Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method. These recursive low-rank matrix updates can also represented as an initial matrix plus a low-rank correction. This is the Compact quasi-Newton representation, which is particularly effective for constrained and/or large problems.

Relationship to matrix inversion

When

f

is a convex quadratic function with positive-definite Hessian

B

, one would expect the matrices

H_k

generated by a quasi-Newton method to converge to the inverse Hessian

H = B^

. This is indeed the case for the class of quasi-Newton methods based on least-change updates.

Notable implementations

Implementations of quasi-Newton methods are available in many programming languages. Notable open source implementations include: *

GNU Octave GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...

uses a form of BFGS in its fsolve function, with trust region extensions. *

GNU Scientific Library The GNU Scientific Library (or GSL) is a software library for numerical computations in applied mathematics and science. The GSL is written in C (programming language), C; wrappers are available for other programming languages. The GSL is part of ...

implements the Broyden-Fletcher-Goldfarb-Shanno ( BFGS) algorithm. *

ALGLIB ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages ( C++, C#, VB.NET, Python, Delphi, Java). ALGLIB started in 1999 and has a long history of steady developm ...

implements (L)BFGS in C++ and C# * R's optim general-purpose optimizer routine uses the BFGS method by using method="BFGS". *

Scipy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...

.optimize has fmin_bfgs. In the

SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...

extension to Python, the scipy.optimize.minimize function includes, among other methods, a BFGS implementation. Notable proprietary implementations include: *

Mathematica Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...

includes quasi-Newton solvers. * The NAG Library contains several routines for minimizing or maximizing a function which use quasi-Newton algorithms. * In MATLAB's Optimization Toolbox, the fminunc function uses (among other methods) the BFGS quasi-Newton method. Many of the constrained methods of the Optimization toolbox use BFGS and the variant L-BFGS.

Search for zeros: root finding

Search for extrema: optimization

Relationship to matrix inversion

Notable implementations

See also

References

Further reading