Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the
Jacobian
In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to:
*Jacobian matrix and determinant
*Jacobian elliptic functions
*Jacobian variety
*Intermediate Jacobian
In mathematics, the intermediate Jacobian of a compact Kähler m ...
or
Hessian is unavailable or is too expensive to compute at every iteration. The "full"
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
requires the Jacobian in order to search for zeros, or the Hessian for finding extrema.
Search for zeros: root finding
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
to find zeroes of a function
of multiple variables is given by
, where
is the
left inverse of the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables ...
of
evaluated for
.
Strictly speaking, any method that replaces the exact Jacobian
with an approximation is a quasi-Newton method. For instance, the chord method (where
is replaced by
for all iterations) is a simple example. The methods given below for
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
refer to an important subclass of quasi-Newton methods, secant methods.
Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes.
Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are the
column-updating method, the
inverse column-updating method
Inverse or invert may refer to:
Science and mathematics
* Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence
* Additive inverse (negation), the inverse of a number that, when a ...
, the
quasi-Newton least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration ...
and the
quasi-Newton inverse least squares method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...
.
More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.
Search for extrema: optimization
The search for a minimum or maximum of a scalar-valued function is nothing else than the search for the zeroes of the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if
is the gradient of
, then searching for the zeroes of the vector-valued function
corresponds to the search for the extrema of the scalar-valued function
; the Jacobian of
now becomes the Hessian of
. The main difference is that
the Hessian matrix is a symmetric matrix, unlike the Jacobian when
searching for zeroes. Most quasi-Newton methods used in optimization exploit this property.
In
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local
maxima and minima
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
of
functions. Quasi-Newton methods are based on
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
to find the
stationary point
In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" i ...
of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a
quadratic
In mathematics, the term quadratic describes something that pertains to squares, to the operation of squaring, to terms of the second degree, or equations or formulas that involve such terms. ''Quadratus'' is Latin for ''square''.
Mathematics ...
in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
of second
derivative
In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...
s of the function to be minimized.
In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the
secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite-difference approximation ...
to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is
under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.
The first quasi-Newton algorithm was proposed by
William C. Davidon
William Cooper Davidon (March 18, 1927 – November 8, 2013) was an American professor of physics and mathematics, and a peace activist. As the mastermind of the March 8, 1971, FBI office break-in, in Media, Pennsylvania, Davidon was the informal ...
, a physicist working at
Argonne National Laboratory
Argonne National Laboratory is a science and engineering research national laboratory operated by UChicago Argonne LLC for the United States Department of Energy. The facility is located in Lemont, Illinois, outside of Chicago, and is the lar ...
. He developed the first quasi-Newton algorithm in 1959: the
DFP updating formula DFP may stand for:
* Data Facility Product, an IBM program product for MVS, and later a component of Data Facility Storage Management Subsystem for MVS
* Davidon-Fletcher-Powell formula in mathematical optimization
* Decimal floating point
* D ...
, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the
SR1 formula The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian)
based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem.
This upda ...
(for "symmetric rank-one"), the
BHHH method, the widespread
BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension
L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...
. The Broyden's class is a linear combination of the DFP and BFGS methods.
The SR1 formula does not guarantee the update matrix to maintain
positive-definiteness and can be used for indefinite problems. The
Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the
Jacobian
In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to:
*Jacobian matrix and determinant
*Jacobian elliptic functions
*Jacobian variety
*Intermediate Jacobian
In mathematics, the intermediate Jacobian of a compact Kähler m ...
(rather than the Hessian).
One of the chief advantages of quasi-Newton methods over
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
is that the
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
(or, in the case of quasi-Newton methods, its approximation)
does not need to be inverted. Newton's method, and its derivatives such as
interior point method
Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems.
An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...
s, require the Hessian to be inverted, which is typically implemented by solving a
system of linear equations and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of
directly.
As in
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
, one uses a second-order approximation to find the minimum of a function
. The
Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...
of
around an iterate is
:
where (
) is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
, and
an approximation to the
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
. The gradient of this approximation (with respect to
) is
:
and setting this gradient to zero (which is the goal of optimization) provides the Newton step:
:
The Hessian approximation
is chosen to satisfy
:
which is called the ''secant equation'' (the Taylor series of the gradient itself). In more than one dimension
is
underdetermined. In one dimension, solving for
and applying the Newton's step with the updated value is equivalent to the
secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite-difference approximation ...
. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as
Broyden's method) seek a symmetric solution (
); furthermore, the variants listed below can be motivated by finding an update
that is as close as possible to
in some
norm; that is,
, where
is some
positive-definite matrix
In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, ...
that defines the norm. An approximate initial value
is often sufficient to achieve rapid convergence, although there is no general strategy to choose
. Note that
should be positive-definite. The unknown
is updated applying the Newton's step calculated using the current approximate Hessian matrix
:
*
, with
chosen to satisfy the
Wolfe conditions In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.
In these methods the idea is to find
::\mi ...
;
*
;
* The gradient computed at the new point
, and
::
is used to update the approximate Hessian
, or directly its inverse
using the
Sherman–Morrison formula.
* A key property of the BFGS and DFP updates is that if
is positive-definite, and
is chosen to satisfy the Wolfe conditions, then
is also positive-definite.
The most popular update formulas are:
:
Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method.
Relationship to matrix inversion
When
is a convex quadratic function with positive-definite Hessian
, one would expect the matrices
generated by a quasi-Newton method to converge to the inverse Hessian
. This is indeed the case for the class of quasi-Newton methods based on least-change updates.
Notable implementations
Implementations of quasi-Newton methods are available in many programming languages.
Notable open source implementations include:
*
GNU Octave
GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lang ...
uses a form of BFGS in its
fsolve
function, with
trust region extensions.
*
GNU Scientific Library implements the Broyden-Fletcher-Goldfarb-Shanno (
BFGS) algorithm.
*
ALGLIB
ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages ( C++, C#, VB.NET, Python, Delphi).
ALGLIB started in 1999 and has a long history of steady development wit ...
implements (L)BFGS in C++ and C#
*
R's
optim
general-purpose optimizer routine uses the
BFGS method by using
method="BFGS"
.
*
Scipy.optimize has fmin_bfgs. In the
SciPy extension to
Python, the
scipy.optimize.minimize
function includes, among other methods, a
BFGS implementation.
Notable proprietary implementations include:
*
Mathematica
Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimi ...
includes quasi-Newton solvers.
* The
NAG Library contains several routines for minimizing or maximizing a function which use quasi-Newton algorithms.
* In MATLAB's
Optimization Toolbox
Optimization Toolbox is an optimization software package developed by MathWorks. It is an add-on product to MATLAB, and provides a library of solvers that can be used from the MATLAB environment. The toolbox was first released for MATLAB in 199 ...
, the
fminunc
function uses (among other methods) the
BFGS quasi-Newton method. Many of the constrained methods of the Optimization toolbox use
BFGS and the variant
L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...
.
<
See also
*
BFGS method
**
L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...
**
OWL-QN
*
Broyden's method
*
DFP updating formula DFP may stand for:
* Data Facility Product, an IBM program product for MVS, and later a component of Data Facility Storage Management Subsystem for MVS
* Davidon-Fletcher-Powell formula in mathematical optimization
* Decimal floating point
* D ...
*
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
*
Newton's method in optimization
*
SR1 formula The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian)
based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem.
This upda ...
References
Further reading
*
*.
*
*
*
{{Optimization algorithms, unconstrained
Optimization algorithms and methods