In
numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
, a quasi-Newton method is an
iterative numerical method used either to
find zeroes or to
find local maxima and minima of functions via an iterative
recurrence formula much like the one for
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, except using approximations of the
derivatives of the functions in place of exact derivatives. Newton's method requires the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...
of all
partial derivatives
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Par ...
of a multivariate function when used to search for zeros or the
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
when used
for finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration.
Some
iterative methods
In computational mathematics, an iterative method is a Algorithm, mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''i''-th approximation (called an " ...
that reduce to Newton's method, such as
sequential quadratic programming
Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twi ...
, may also be considered quasi-Newton methods.
Search for zeros: root finding
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
to find zeroes of a function
of multiple variables is given by
, where
is the
left inverse of the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. If this matrix is square, that is, if the number of variables equals the number of component ...
of
evaluated for
.
Strictly speaking, any method that replaces the exact Jacobian
with an approximation is a quasi-Newton method. For instance, the chord method (where
is replaced by
for all iterations) is a simple example. The methods given below for
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
refer to an important subclass of quasi-Newton methods,
secant methods.
Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes.
Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are the
column-updating method, the
inverse column-updating method, the quasi-Newton least squares method and the quasi-Newton inverse least squares method.
More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.
Search for extrema: optimization
The search for a minimum or maximum of a scalar-valued function is closely related to the search for the zeroes of the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if
is the gradient of
, then searching for the zeroes of the vector-valued function
corresponds to the search for the extrema of the scalar-valued function
; the Jacobian of
now becomes the Hessian of
. The main difference is that
the Hessian matrix is a symmetric matrix, unlike the Jacobian when
searching for zeroes. Most quasi-Newton methods used in optimization exploit this symmetry.
In
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local
maxima and minima
In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...
of
functions. Quasi-Newton methods for optimization are based on
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
to find the
stationary points of a function, points where the gradient is 0. Newton's method assumes that the function can be locally approximated as a
quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
of second
derivative
In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
s of the function to be minimized.
In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the
secant method to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is
under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.
The first quasi-Newton algorithm was proposed by
William C. Davidon, a physicist working at
Argonne National Laboratory
Argonne National Laboratory is a Federally funded research and development centers, federally funded research and development center in Lemont, Illinois, Lemont, Illinois, United States. Founded in 1946, the laboratory is owned by the United Sta ...
. He developed the first quasi-Newton algorithm in 1959: the
DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the
SR1 formula (for "symmetric rank-one"), the
BHHH method, the widespread
BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension
L-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods.
The SR1 formula does not guarantee the update matrix to maintain
positive-definiteness and can be used for indefinite problems. The
Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the
Jacobian (rather than the Hessian).
One of the chief advantages of quasi-Newton methods over
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
is that the
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
(or, in the case of quasi-Newton methods, its approximation)
does not need to be inverted. Newton's method, and its derivatives such as
interior point method
Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...
s, require the Hessian to be inverted, which is typically implemented by solving a
system of linear equations
In mathematics, a system of linear equations (or linear system) is a collection of two or more linear equations involving the same variable (math), variables.
For example,
: \begin
3x+2y-z=1\\
2x-2y+4z=-2\\
-x+\fracy-z=0
\end
is a system of th ...
and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of
directly.
As in
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
, one uses a second-order approximation to find the minimum of a function
. The
Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
of
around an iterate is
:
where (
) is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
, and
an approximation to the
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
. The gradient of this approximation (with respect to
) is
:
and setting this gradient to zero (which is the goal of optimization) provides the Newton step:
:
The Hessian approximation
is chosen to satisfy
:
which is called the ''secant equation'' (the Taylor series of the gradient itself). In more than one dimension
is
underdetermined. In one dimension, solving for
and applying the Newton's step with the updated value is equivalent to the
secant method. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as
Broyden's method) seek a symmetric solution (
); furthermore, the variants listed below can be motivated by finding an update
that is as close as possible to
in some
norm
Norm, the Norm or NORM may refer to:
In academic disciplines
* Normativity, phenomenon of designating things as good or bad
* Norm (geology), an estimate of the idealised mineral content of a rock
* Norm (philosophy), a standard in normative e ...
; that is,
, where
is some
positive-definite matrix
In mathematics, a symmetric matrix M with real entries is positive-definite if the real number \mathbf^\mathsf M \mathbf is positive for every nonzero real column vector \mathbf, where \mathbf^\mathsf is the row vector transpose of \mathbf.
Mo ...
that defines the norm. An approximate initial value
is often sufficient to achieve rapid convergence, although there is no general strategy to choose
. Note that
should be positive-definite. The unknown
is updated applying the Newton's step calculated using the current approximate Hessian matrix
:
*
, with
chosen to satisfy the
Wolfe conditions;
*
;
* The gradient computed at the new point
, and
::
is used to update the approximate Hessian
, or directly its inverse
using the
Sherman–Morrison formula.
* A key property of the BFGS and DFP updates is that if
is positive-definite, and
is chosen to satisfy the Wolfe conditions, then
is also positive-definite.
The most popular update formulas are:
:
Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method.
These recursive low-rank matrix updates can also represented as an initial matrix plus a low-rank correction. This is the
Compact quasi-Newton representation, which is particularly effective for
constrained and/or large problems.
Relationship to matrix inversion
When
is a convex quadratic function with positive-definite Hessian
, one would expect the matrices
generated by a quasi-Newton method to converge to the inverse Hessian
. This is indeed the case for the class of quasi-Newton methods based on least-change updates.
Notable implementations
Implementations of quasi-Newton methods are available in many programming languages.
Notable open source implementations include:
*
GNU Octave
GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
uses a form of BFGS in its
fsolve
function, with
trust region extensions.
*
GNU Scientific Library
The GNU Scientific Library (or GSL) is a software library for numerical computations in applied mathematics and science. The GSL is written in C (programming language), C; wrappers are available for other programming languages. The GSL is part of ...
implements the Broyden-Fletcher-Goldfarb-Shanno (
BFGS) algorithm.
*
ALGLIB
ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages ( C++, C#, VB.NET, Python, Delphi, Java).
ALGLIB started in 1999 and has a long history of steady developm ...
implements (L)BFGS in C++ and C#
*
R's
optim
general-purpose optimizer routine uses the
BFGS method by using
method="BFGS"
.
*
Scipy
SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...
.optimize has fmin_bfgs. In the
SciPy
SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...
extension to
Python, the
scipy.optimize.minimize
function includes, among other methods, a
BFGS implementation.
Notable proprietary implementations include:
*
Mathematica
Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
includes quasi-Newton solvers.
* The
NAG Library contains several routines for minimizing or maximizing a function which use quasi-Newton algorithms.
* In MATLAB's
Optimization Toolbox, the
fminunc
function uses (among other methods) the
BFGS quasi-Newton method. Many of the constrained methods of the Optimization toolbox use
BFGS and the variant
L-BFGS.
See also
*
BFGS method
**
L-BFGS
**
OWL-QN
*
Broyden's method
*
DFP updating formula
*
Newton's method
In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
*
Newton's method in optimization
In calculus, Newton's method (also called Newton–Raphson) is an iterative method for finding the roots of a differentiable function f, which are solutions to the equation f(x)=0. However, to optimize a twice-differentiable f, our goal is to f ...
*
SR1 formula
*
Compact quasi-Newton representation
References
Further reading
*
*.
*
*
*
{{Optimization algorithms, unconstrained