In mathematics, mirror descent is an

iterative Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

for finding a

local minimum In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...

of a

differentiable function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...

. It generalizes algorithms such as

gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...

and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.

Motivation

with the sequence of learning rates

(\eta_n)_

applied to a differentiable function

F

, one starts with a guess

\mathbf_0

for a local minimum of

F,

and considers the sequence

\mathbf_0, \mathbf_1, \mathbf_2, \ldots

such that :

\mathbf_=\mathbf_n-\eta_n \nabla F(\mathbf_n),\ n \ge 0.

This can be reformulated by noting that :

\mathbf_=\arg \min_ \left(F(\mathbf_n) + \nabla F(\mathbf_n)^T (\mathbf - \mathbf_n) + \frac\, \mathbf - \mathbf_n\, ^2\right)

In other words,

\mathbf_

minimizes the first-order approximation to

F

\mathbf_n

with added proximity term

\, \mathbf - \mathbf_n\, ^2

. This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as

Hedge A hedge or hedgerow is a line of closely spaced (3 feet or closer) shrubs and sometimes trees, planted and trained to form a barrier or to mark the boundary of an area, such as between neighbouring properties. Hedges that are used to separate ...

which may be more suited to optimization over particular geometries.

Formulation

We are given convex function

f

to optimize over a convex set

K \subset \mathbb^n

, and given some norm

\, \cdot\,

\mathbb^n

. We are also given differentiable convex function

h \colon \mathbb^n \to \mathbb

\alpha

- strongly convex with respect to the given norm. This is called the ''distance-generating function'', and its

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

\nabla h \colon \mathbb^n \to \mathbb^n

is known as the ''mirror map''. Starting from initial

x_0 \in K

, in each iteration of Mirror Descent: * Map to the dual space:

\theta_t \leftarrow \nabla h (x_t)

* Update in the dual space using a gradient step:

\theta_ \leftarrow \theta_t - \eta_t \nabla f(x_t)

* Map back to the primal space:

x'_ \leftarrow (\nabla h)^(\theta_)

* Project back to the feasible region

K

x_ \leftarrow \mathrm\min_D_h(x, , x'_)

, where

D_h

is the

Bregman divergence In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. ...

Extensions

Mirror descent in the

online optimization Online optimization is a field of optimization theory, more popular in computer science and operations research, that deals with optimization problems having no or incomplete knowledge of the future (online). These kind of problems are denoted as o ...

setting is known as Online Mirror Descent (OMD).

References

* {{Optimization algorithms, unconstrained Mathematical optimization Optimization algorithms and methods Gradient methods

History

Motivation

Formulation

Extensions

See also

References