Augmented Lagrangian methods are a certain class of

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s for solving constrained

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

problems. They have similarities to

penalty method In mathematical optimization, penalty methods are a certain class of algorithms for solving constrained optimization problems. A penalty method replaces a constrained optimization problem by a series of unconstrained problems whose solutions idea ...

s in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a

Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function (mathematics), function subject to constraint (mathematics), equation constraints (i.e., subject to the conditio ...

. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers. Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation). The method was originally known as the method of multipliers and was studied in the 1970s and 1980s as a potential alternative to penalty methods. It was first discussed by Magnus Hestenes and then by

Michael Powell Michael Latham Powell (30 September 1905 – 19 February 1990) was an English filmmaker, celebrated for his partnership with Emeric Pressburger. Through their production company Powell and Pressburger, The Archers, they together wrote, produced ...

in 1969. The method was studied by

R. Tyrrell Rockafellar Ralph Tyrrell Rockafellar (born February 10, 1935) is an American mathematician and one of the leading scholars in optimization theory and related fields of analysis and combinatorics. He is the author of four major books including the landmark ...

in relation to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization, and maximal monotone operators; these methods were used in structural optimization. The method was also studied by Dimitri Bertsekas, notably in his 1982 book, together with extensions involving non-quadratic regularization functions (e.g., entropic regularization). This combined study gives rise to the "exponential method of multipliers" which handles inequality constraints with a twice-differentiable augmented Lagrangian function. Since the 1970s,

sequential quadratic programming Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twi ...

(SQP) and

interior point method Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...

s (IPM) have been given more attention, in part because they more easily use

sparse matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse ...

subroutine In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a ...

s from numerical

software libraries In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...

, and in part because IPMs possess proven complexity results via the theory of self-concordant functions. The augmented Lagrangian method was rejuvenated by the optimization systems

LANCELOT Lancelot du Lac (French for Lancelot of the Lake), alternatively written as Launcelot and other variants, is a popular character in the Matter of Britain, Arthurian legend's chivalric romance tradition. He is typically depicted as King Arthu ...

, ALGENCAN and

AMPL AMPL (A Mathematical Programming Language) is an algebraic modeling language to describe and solve high-complexity problems for large-scale mathematical computing (e.g. large-scale optimization and scheduling-type problems). It was developed ...

, which allowed sparse matrix techniques to be used on seemingly dense but "partially-separable" problems. The method is still useful for some problems., chapter 17 Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as

total variation denoising In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process ( filter). It is based on the principle that signals with excess ...

and

compressed sensing Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a Signal (electronics), signal by finding solutions to Underdetermined s ...

. In particular, a variant of the standard augmented Lagrangian method that uses partial updates (similar to the

Gauss–Seidel method In numerical linear algebra, the Gauss–Seidel method, also known as the Liebmann method or the method of successive displacement, is an iterative method used to solve a system of linear equations. It is named after the German mathematicians Ca ...

for solving linear equations) known as the alternating direction method of multipliers or ADMM gained some attention.

General method

Consider solving the following constrained optimization problem: :

\min f(\mathbf)

subject to :

c_i(\mathbf) = 0 ~\forall  i \in \mathcal,

where

\mathcal

denotes the indices for equality constraints. This problem can be solved as a series of unconstrained minimization problems. For reference, we first list the ''k''th step of the

approach: :

\min \Phi_k (\mathbf) = f (\mathbf) + \mu_k ~ \sum_ ~ c_i(\mathbf)^2.

The penalty method solves this problem, then at the next iteration it re-solves the problem using a larger value of

\mu_k

and using the old solution as the initial guess or "warm start". The augmented Lagrangian method uses the following unconstrained objective: :

\min \Phi_k (\mathbf) = f (\mathbf) + \frac ~ \sum_ ~ c_i(\mathbf)^2  + \sum_ ~ \lambda_i c_i(\mathbf)

and after each iteration, in addition to updating

\mu_k

, the variable

\lambda

is also updated according to the rule :

\lambda_i \leftarrow \lambda_i + \mu_k c_i(\mathbf_k)

where

\mathbf_k

is the solution to the unconstrained problem at the ''k''th step (i.e.

\mathbf_k=\text \Phi_k(\mathbf)

). The variable

\lambda

is an estimate of the

, and the accuracy of this estimate improves at every step. The major advantage of the method is that unlike the

, it is not necessary to take

\mu \rightarrow \infty

in order to solve the original constrained problem. Because of the presence of the Lagrange multiplier term,

\mu

can stay much smaller, and thus avoiding ill-conditioning. Nevertheless, it is common in practical implementations to project multipliers estimates in a large bounded set (safeguards) which avoids numerical instabilities and leads to strong theoretical convergence. The method can be extended to handle inequality constraints. For a discussion of practical improvements, see refs.

Alternating direction method of multipliers

The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables. This method is often applied to solve problems such as, :

\min_x f(x) + g(Mx).

This is equivalent to the constrained problem, :

\min_ f(x) + g(y), \quad \text\quad  Mx = y.

Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in ''x'' and ''y''. The dual update requires solving a proximity function in ''x'' and ''y'' at the same time; the ADMM technique allows this problem to be solved approximately by first solving for ''x'' with ''y'' fixed and then solving for ''y'' with ''x'' fixed. Rather than iterate this process until convergence (like the Jacobi method), the ADMM algorithm proceeds directly to updating the dual variable and then repeats the process. This is not equivalent to the exact minimization, but the method still converges to the correct solution under some assumptions. Because of it does not minimize or approximately minimize the augmented Lagrangian, the algorithm is distinct from the ordinary augmented Lagrangian method. The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be found in ref. There are several modern software packages, including YALL1 (2009), SpaRSA (2009) and SALSA (2009), which solve Basis pursuit and variants and use the ADMM. There are also packages which use the ADMM to solve more general problems, some of which can exploit multiple computing cores (e.g., SNAPVX (2015), parADMM (2016)).

Stochastic optimization

Stochastic optimization considers the problem of minimizing a loss function with access to noisy samples of the (gradient of the) function. The goal is to have an estimate of the optimal parameter (minimizer) per new sample. With some modifications, ADMM can be used for stochastic optimization. In a stochastic setting, only noisy samples of a gradient are accessible, so an inexact approximation of the Lagrangian is used:

\hat_ = f_1(x_k)+\langle \nabla f(x_k,\zeta_),x \rangle+g(y)-z^T (Ax + By - c)+\frac \Vert Ax + By - c \Vert^2+\frac,

where

\eta_

is a time-varying step size. ADMM has been applied to solve regularized problems, where the function optimization and regularization can be carried out locally and then coordinated globally via constraints. Regularized optimization problems are especially relevant in the high-dimensional regime as regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution (e.g., sparsity and low rank). ADMM's effectiveness for solving regularized problems may mean it could be useful for solving high-dimensional stochastic optimization problems.

Alternative approaches

Sequential quadratic programming Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twi ...

* Sequential linear programming *

Sequential linear-quadratic programming Sequential linear-quadratic programming (SLQP) is an iterative method for nonlinear optimization problems where objective function and constraints are twice continuously differentiable. Similarly to sequential quadratic programming (SQP), SLQP pr ...

Software

Open source and non-free/commercial implementations of the augmented Lagrangian method: * Accord.NET (C# implementation of augmented Lagrangian optimizer) *

ALGLIB ALGLIB is a cross-platform open source numerical analysis and data processing library. It can be used from several programming languages ( C++, C#, VB.NET, Python, Delphi, Java). ALGLIB started in 1999 and has a long history of steady developm ...

(C# and C++ implementations of preconditioned augmented Lagrangian solver) *

PENNON A pennon, also known as a pennant or pendant, is a long narrow flag which is larger at the Hoist (flag), hoist than at the Fly (flag), fly, i.e., the flag narrows as it moves away from the flagpole. It can have several shapes, such as triangular ...

(GPL 3, commercial license available) *

(free "internal use" license, paid commercial options) *

MINOS Main injector neutrino oscillation search (MINOS) was a particle physics experiment designed to study the phenomena of neutrino oscillations, first discovered by a Super-Kamiokande (Super-K) experiment in 1998. Neutrinos produced by the NuMI ...

(also uses an augmented Lagrangian method for some types of problems). * The code for Apache 2.0 licensed

REASON Reason is the capacity of consciously applying logic by drawing valid conclusions from new or existing information, with the aim of seeking the truth. It is associated with such characteristically human activities as philosophy, religion, scien ...

is available online. * ALGENCAN (Fortran implementation of augmented Lagrangian method with safeguards). Available online. * NLOPT (C++ implementation of augmented Lagrangian optimizer, accessible from different programming languages) * PyProximal (Python implementation of augmented Lagrangian method).

References

Bibliography

* * * {{DEFAULTSORT:Augmented Lagrangian Method Optimization algorithms and methods

General method

Alternating direction method of multipliers

Stochastic optimization

Alternative approaches

Software

See also

References

Bibliography