The Bregman method is an iterative algorithm to solve certain

convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...

problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967. The algorithm is a row-action method accessing constraint functions one by one and the method is particularly suited for large optimization problems where constraints can be efficiently enumerated. The algorithm works particularly well for regularizers such as the

\ell_1

norm, where it converges very quickly because of an error-cancellation effect.

Algorithm

In order to be able to use the Bregman method, one must frame the problem of interest as finding

\min_u J(u) + f(u)

, where

J

is a regularizing function such as

\ell_1

. The ''Bregman distance'' is defined as

D^p(u, v) := J(u) - (J(v) + \langle p, u - v\rangle)

where

p

belongs to the

subdifferential In mathematics, the subderivative (or subgradient) generalizes the derivative to convex functions which are not necessarily Differentiable function, differentiable. The set of subderivatives at a point is called the subdifferential at that point. ...

J

u

(which we denoted

\partial J(u)

). One performs the iteration

u_:= \min_u(\alpha D(u, u_k) + f(u))

, with

\alpha

a constant to be chosen by the user (and the minimization performed by an ordinary convex optimization algorithm), or

u_:= \min_u(D^(u, u_k) + f(u))

, with

p_k

chosen each time to be a member of

\partial J(u_k)

. The algorithm starts with a pair of primal and dual variables. Then, for each constraint a generalized projection onto its feasible set is performed, updating both the constraint's dual variable and all primal variables for which there are non-zero coefficients in the constraint functions gradient. In case the objective is strictly convex and all constraint functions are convex, the limit of this iterative projection converges to the optimal primal dual pair. In the case of a basis pursuit-type problem

\min_(, x, _1 + \frac, x, _2^2)

, the Bregman method is equivalent to ordinary gradient descent on the dual problem

\min_y (-b^t y + \frac, A^t y - \text_(A^t y), ^2)

. An exact regularization-type effect also occurs in this case; if

\alpha

exceeds a certain threshold, the optimum value of

x

is precisely the optimum solution of

\min_, x, _1

Applications

The Bregman method or its generalizations can be applied to: * Image deblurring or denoising (including

total variation denoising In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process ( filter). It is based on the principle that signals with excess ...

) * MR image reconstruction *

Magnetic resonance imaging Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to generate pictures of the anatomy and the physiological processes inside the body. MRI scanners use strong magnetic fields, magnetic field gradients, and ...

Radar Radar is a system that uses radio waves to determine the distance ('' ranging''), direction ( azimuth and elevation angles), and radial velocity of objects relative to the site. It is a radiodetermination method used to detect and track ...

Hyperspectral imaging Hyperspectral imaging collects and processes information from across the electromagnetic spectrum. The goal of hyperspectral imaging is to obtain the spectrum for each pixel in the image of a scene, with the purpose of finding objects, identifyi ...

* Compressed sensing * Least absolute deviations or

\ell_1

-regularized linear regression *Covariance selection (learning a sparse covariance matrix) * Matrix completion * Structural risk minimization

Generalizations and drawbacks

The method has links to the method of multipliers and dual ascent method (through the so-called ''Bregman alternating direction method of multipliers'', generalizing the ''alternating direction method of multipliers'') and multiple generalizations exist. One drawback of the method is that it is only provably convergent if the objective function is ''strictly'' convex. In case this can not be ensured, as for

linear program Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear relationships. Linear ...

s or non-strictly convex quadratic programs, additional methods such as proximal gradient methods have been developed. In the case of the Rudin-Osher-Fatemi model of image denoising, the Bregman method provably converges. Some generalizations of the Bregman method include: * Inverse scale space method * Linearized Bregman * Logistic Bregman * Split Bregman

Linearized Bregman

In the Linearized Bregman method, one linearizes the intermediate objective functions

D^p(u, u_k) + f(u)

by replacing the second term with

f(u_k) + \langle f'(u_k), u - u_k\rangle

(which approximates the second term near

u_k

) and adding the penalty term

\frac, u - u_k, _2^2

for a constant

\delta

. The result is much more computationally tractable, especially in basis pursuit-type problems. In the case of a generic basis pursuit problem

\min \mu, u, _1 + \frac, Au - f, _2^2

, one can express the iteration as

v_ := v_k + A^t(f - Au_k), u_ := \delta~\text(v_, \mu)

for each component

i

, where we define

\ y + a, & y\in(-\infty, -a) \end

. Sometimes, when running the Linearized Bregman method, there are periods of "stagnation" where the residual is almost constant. To alleviate this issue, one can use the Linearized Bregman method with ''kicking'', where one essentially detects the beginning of a stagnation period, then predicts and skips to the end of it. Since Linearized Bregman is mathematically equivalent to gradient descent, it can be accelerated with methods to accelerate gradient descent, such as line search, L-BGFS, Barzilai-Borwein steps, or the Nesterov method; the last has been proposed as the ''accelerated linearized Bregman method''.

Split Bregman

The Split Bregman method solves problems of the form

\min_u , \Phi(u), _1 + H(u)

, where

\Phi

and

H

are both convex, particularly problems of the form

\min_u , \Phi u, _1 + , Ku - f, ^2

. We start by rewriting it as the constrained optimization problem

\min_ , d, _1 + H(u)

, then relax it into

\min_ , d, _1 + H(u) + \frac, d - \Phi(u), _2^2

where

\lambda

is a constant. By defining

J(u, d) := , d,  + H(u)

, one reduces the problem to one that can be solved with the ordinary Bregman algorithm. The Split Bregman method has been generalized to optimization over

complex number In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the for ...

s using Wirtinger derivatives.

References

{{reflist Optimization algorithms and methods Convex optimization