The Frank–Wolfe algorithm is an
iterative
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
first-order optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for
constrained convex optimization
Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization prob ...
. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by
Marguerite Frank and
Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a
linear approximation
In mathematics, a linear approximation is an approximation of a general function using a linear function (more precisely, an affine function). They are widely used in the method of finite differences to produce first order methods for solving o ...
of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).
Problem statement
Suppose
is a
compact
Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to:
* Interstate compact
* Blood compact, an ancient ritual of the Philippines
* Compact government, a type of colonial rule utilized in British ...
convex set
In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex ...
in a
vector space
In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called '' vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but ...
and
is a
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
,
differentiable
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point i ...
real-valued function
In mathematics, a real-valued function is a function whose values are real numbers. In other words, it is a function that assigns a real number to each member of its domain.
Real-valued functions of a real variable (commonly called ''real ...
. The Frank–Wolfe algorithm solves the
optimization problem
In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions.
Optimization problems can be divided into two categories, depending on whether the variables ...
:Minimize
:subject to
.
Algorithm

:''Initialization:'' Let
, and let
be any point in
.
:Step 1. ''Direction-finding subproblem:'' Find
solving
::Minimize
::Subject to
:''(Interpretation: Minimize the linear approximation of the problem given by the first-order
Taylor approximation
In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
of
around
constrained to stay within
.)''
:Step 2. ''Step size determination:'' Set
, or alternatively find
that minimizes
subject to
.
:Step 3. ''Update:'' Let
, let
and go to Step 1.
Properties
While competing methods such as
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of ...
for constrained optimization require a
projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set.
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is
after ''k'' iterations, so long as the gradient is
Lipschitz continuous
In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there ...
with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
problems, as well as for example the optimization of
minimum–cost flows in
transportation networks.
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a
linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
.
While the worst-case convergence rate with
can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.
Lower bounds on the solution value, and primal-dual analysis
Since
is
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
, for any two points
we have:
:
This also holds for the (unknown) optimal solution
. That is,
. The best lower bound with respect to a given point
is given by
:
The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solution
of the direction-finding subproblem of the
-th iteration can be used to determine increasing lower bounds
during each iteration by setting
and
:
Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always
.
It has been shown that this corresponding
duality gap In optimization problems in applied mathematics, the duality gap is the difference between the primal and dual solutions. If d^* is the optimal dual value and p^* is the optimal primal value then the duality gap is equal to p^* - d^*. This value ...
, that is the difference between
and the lower bound
, decreases with the same convergence rate, i.e.
Notes
Bibliography
* (Overview paper)
The Frank–Wolfe algorithmdescription
* .
External links
*https://conditional-gradients.org/: a survey of Frank–Wolfe algorithms.
Marguerite Frank giving a personal account of the history of the algorithm
See also
*
Proximal gradient methods
{{DEFAULTSORT:Frank-Wolfe algorithm
Optimization algorithms and methods
Iterative methods
First order methods
Gradient methods