HOME

TheInfoList



OR:

In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact
line search In optimization, the line search strategy is one of two basic iterative approaches to find a local minimum \mathbf^* of an objective function f:\mathbb R^n\to\mathbb R. The other approach is trust region. The line search approach first finds a d ...
, especially in quasi-Newton methods, first published by Philip Wolfe in 1969. In these methods the idea is to find ::\min_x f(\mathbf) for some
smooth Smooth may refer to: Mathematics * Smooth function, a function that is infinitely differentiable; used in calculus and topology * Smooth manifold, a differentiable manifold for which all the transition maps are smooth functions * Smooth algebraic ...
f\colon\mathbb R^n\to\mathbb R. Each step often involves approximately solving the subproblem ::\min_ f(\mathbf_k + \alpha \mathbf_k) where \mathbf_k is the current best guess, \mathbf_k \in \mathbb R^n is a search direction, and \alpha \in \mathbb R is the step length. The inexact line searches provide an efficient way of computing an acceptable step length \alpha that reduces the
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
'sufficiently', rather than minimizing the objective function over \alpha\in\mathbb R^+ exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed \alpha, before finding a new search direction \mathbf_k.


Armijo rule and curvature

A step length \alpha_k is said to satisfy the ''Wolfe conditions'', restricted to the direction \mathbf_k, if the following two inequalities hold: : \begin \textbf & \quad f(\mathbf_k+\alpha_k\mathbf_k)\leq f(\mathbf_k) + c_1\alpha_k \mathbf_k^ \nabla f(\mathbf_k),\\ pt\textbf & \quad _k^\nabla f(\mathbf_k+\alpha_k\mathbf_k) \leq -c_2\mathbf_k^\nabla f(\mathbf_k), \end with 0. (In examining condition (ii), recall that to ensure that \mathbf_k is a descent direction, we have \mathbf_k^\nabla f(\mathbf_k) < 0 , as in the case of
gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of th ...
, where \mathbf_k = -\nabla f(\mathbf_k), or
Newton–Raphson In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-v ...
, where \mathbf_k = -\mathbf^ \nabla f(\mathbf_k) with \mathbf positive definite.) c_1 is usually chosen to be quite small while c_2 is much larger; Nocedal and Wright give example values of c_1=10^ and c_2=0.9 for Newton or quasi-Newton methods and c_2=0.1 for the nonlinear
conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...
. Inequality i) is known as the Armijo rule and ii) as the curvature condition; i) ensures that the step length \alpha_k decreases f 'sufficiently', and ii) ensures that the slope has been reduced sufficiently. Conditions i) and ii) can be interpreted as respectively providing an upper and lower bound on the admissible step length values.


Strong Wolfe condition on curvature

Denote a univariate function \varphi restricted to the direction \mathbf_k as \varphi(\alpha)=f(\mathbf_k+\alpha\mathbf_k). The Wolfe conditions can result in a value for the step length that is not close to a minimizer of \varphi. If we modify the curvature condition to the following, : \textbf \quad \big, \mathbf_k^\nabla f(\mathbf_k+\alpha_k\mathbf_k)\big, \leq c_2\big, \mathbf_k^\nabla f(\mathbf_k)\big, then i) and iii) together form the so-called strong Wolfe conditions, and force \alpha_k to lie close to a critical point of \varphi.


Rationale

The principal reason for imposing the Wolfe conditions in an optimization algorithm where \mathbf_ = \mathbf_k + \alpha \mathbf_k is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between \mathbf_k and the gradient, :: \cos \theta_k = \frac is bounded away from zero and the i) and ii) conditions hold, then \nabla f(\mathbf_k) \rightarrow 0 . An additional motivation, in the case of a
quasi-Newton method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration ...
, is that if \mathbf_k = -B_k^ \nabla f(\mathbf_k) , where the matrix B_k is updated by the BFGS or DFP formula, then if B_k is positive definite ii) implies B_ is also positive definite.


Comments

Wolfe's conditions are more complicated than Armijo's condition, and a gradient descent algorithm based on Armijo's condition has a better theoretical guarantee than one based on Wolfe conditions (see the sections on "Upper bound for learning rates" and "Theoretical guarantee" in the
Backtracking line search In (unconstrained) mathematical optimization, a backtracking line search is a line search method to determine the amount to move along a given search direction. Its use requires that the objective function is differentiable and that its gradient ...
article).


See also

*
Backtracking line search In (unconstrained) mathematical optimization, a backtracking line search is a line search method to determine the amount to move along a given search direction. Its use requires that the objective function is differentiable and that its gradient ...


References


Further reading

* * {{Optimization algorithms Mathematical optimization