In
numerical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, the nonlinear conjugate gradient method generalizes the
conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iter ...
to
nonlinear optimization
In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints or the objective function are nonlinear. An optimization problem is one of calculation of the extrema (maxima, minima or st ...
. For a quadratic function
::
the minimum of
is obtained when the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
is 0:
::
.
Whereas linear conjugate gradient seeks a solution to the linear equation
, the nonlinear conjugate gradient method is generally
used to find the
local minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
of a nonlinear function
using its
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.
Given a function
of
variables to minimize, its gradient
indicates the direction of maximum increase.
One simply starts in the opposite (
steepest descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of t ...
) direction:
::
with an adjustable step length
and performs a
line search
In optimization, the line search strategy is one of two basic iterative approaches to find a local minimum \mathbf^* of an objective function f:\mathbb R^n\to\mathbb R. The other approach is trust region.
The line search approach first finds a ...
in this direction until it reaches the minimum of
:
::
,
::
After this first iteration in the steepest direction
, the following steps constitute one iteration of moving along a subsequent conjugate direction
, where
:
# Calculate the steepest direction:
,
# Compute
according to one of the formulas below,
# Update the conjugate direction:
# Perform a line search: optimize
,
# Update the position:
,
With a pure quadratic function the minimum is reached within ''N'' iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every ''N'' iterations, or sooner if progress stops. However, resetting every iteration turns the method into
steepest descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of t ...
. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.
Within a linear approximation, the parameters
and
are the same as in the
linear conjugate gradient method but have been obtained with line searches.
The conjugate gradient method can follow narrow (
ill-conditioned
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the inpu ...
) valleys, where the
steepest descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of t ...
method slows down and follows a criss-cross pattern.
Four of the best known formulas for
are named after their developers:
* Fletcher–Reeves:
::
* Polak–Ribière:
::
* Hestenes-Stiefel:
::
* Dai–Yuan:
::
.
These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is
, which provides a direction reset automatically.
Algorithms based on
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
potentially converge much faster. There, both step direction and length are computed from the gradient as the solution of a linear system of equations, with the coefficient matrix being the exact
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
(for Newton's method proper) or an estimate thereof (in the
quasi-Newton method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration ...
s, where the observed change in the gradient during the iterations is used to update the Hessian estimate). For high-dimensional problems, the exact computation of the Hessian is usually prohibitively expensive, and even its storage can be problematic, requiring
memory (but see the limited-memory
L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...
quasi-Newton method).
The conjugate gradient method can also be derived using
optimal control theory
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
.
In this accelerated optimization theory, the conjugate gradient method falls out as a nonlinear
optimal feedback controller,
for the
double integrator system,
The quantities
and
are variable feedback gains.
See also
*
Broyden–Fletcher–Goldfarb–Shanno algorithm
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines th ...
*
Conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iter ...
*
L-BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular alg ...
(limited memory BFGS)
*
Nelder–Mead method
The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope method) is a numerical method used to find the minimum or maximum of an objective function in a multidimensional space. It is a direct search method (based on ...
*
Wolfe conditions In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.
In these methods the idea is to find
::\mi ...
References
{{Optimization algorithms
Optimization algorithms and methods
Gradient methods