The compact representation for

quasi-Newton methods In numerical analysis, a quasi-Newton method is an Iterative method, iterative numerical method used either to Root-finding algorithm, find zeroes or to Mathematical optimization, find local maxima and minima of functions via an iterative recurren ...

is a

matrix decomposition In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of ...

, which is typically used in

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

based

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...

or for solving

nonlinear systems In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathem ...

. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and

constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian

H_k

or direct Hessian

B_k

of a nonlinear

objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

f(x):\mathbb^n \to \mathbb

expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank-

k

or rank-

2k

update of an initial matrix. Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients

\nabla f(x_k) = g_k

in its definition

\_^k

. In particular, for

r=k

r=2k

the rectangular

n \times r

matrices

U_k, J_k

and the

r \times r

square symmetric systems

M_k, N_k

depend on the

s_i,y_i

's and define the quasi-Newton representations :

H_k = H_0 + U_k M^_k U^T_k, \quad \text \quad B_k = B_0 + J_k N^_k J^T_k

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software. When combined with limited-memory techniques it is a popular technique for

with gradients. Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and

trust region In mathematical optimization, a trust region is the subset of the region of the objective function that is approximated using a model function (often a quadratic). If an adequate model of the objective function is found within the trust region, th ...

techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector

g \in \mathbb^n

is: :

\begin
p^_k &= J^T_k g \\
\text \quad N_k p^_k &= p^_k \quad \quad \textN_k \text \\
p^_k &= J_k p^_k \\
p^_k &= H_0 g \\
p^_k       &= p^_k + p^_k
\end

Background

In the context of the

GMRES In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace wit ...

method, Walker showed that a product of

Householder transformation In linear algebra, a Householder transformation (also known as a Householder reflection or elementary reflector) is a linear transformation that describes a reflection (mathematics), reflection about a plane (mathematics), plane or hyperplane conta ...

s (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of

k

identity plus rank-1 matrices. Specifically, for

S_k = \begin s_0 & s_1 & \ldots s_ \end,

~Y_k = \begin y_0 & y_1 & \ldots y_ \end,

~(R_k)_ = s^T_ y_,

~\rho_ = 1/s^T_ y_

and

~V_i = I - \rho_ y_ s^T_

when

1 \le i \le j \le k

the product of

k

rank-1 updates to the identity is

\prod_^k V_ = 
\left( I - \rho_ y_ s^T_ \right) \cdots \left( I - \rho_ y_ s^T_ \right)
= I - Y_k R^_k S^T_k

The BFGS update can be expressed in terms of products of the

V_i

's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas. For arbitrary vectors

v_

and

c_

such that

v^T_y_\ne 0

and

c^T_s_ \ne 0

general recursive update formulas for the inverse and direct Hessian estimates are By making specific choices for the parameter vectors

v_

and

c_

well known methods are recovered

Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define

S_k = \begin s_0 & s_1 & \ldots & s_ \end,

Y_k = \begin y_0 & y_1 & \ldots & y_ \end,

V_k = \begin v_0 & v_1 & \ldots & v_ \end,

C_k = \begin c_0 & c_1 & \ldots & c_ \end,

upper triangular

\big(R_k\big)_ := \big(R^_k\big)_ = s^T_y_, \quad  \big(R^_k\big)_ = v^T_y_
, \quad  \big(R^_k\big)_ = c^T_s_, \quad \quad \text 1 \le i \le j \le k

lower triangular

\big(L_k\big)_ := \big(L^_k\big)_ = s^T_y_, \quad  \big(L^_k\big)_ = v^T_y_
, \quad  \big(L^_k\big)_ = c^T_s_, \quad \quad \text 1 \le j < i \le k

and diagonal

(D_k)_ := \big(D^_k\big)_ = s^T_y_, \quad \quad \text 1 \le i = j \le k

With these definitions the compact representations of general rank-2 updates in () and () (including the well known quasi-Newton updates in Table 1) have been developed in Brust:

U_k = \begin V_k & S_k - H_0 Y_k \end

M_k = 
\begin 0_ & R^_k \\ 
\big( R^_k \big)^T & R_k+R^T_k-(D_k+Y^T_k H_0 Y_k) 
\end

and the formula for the direct Hessian is

J_k = \begin C_k & Y_k - B_0 S_k \end

N_k = 
\begin 0_ & R^_k \\ 
\big( R^_k \big)^T & R_k+R^T_k-(D_k+S^T_k B_0 S_k) 
\end

For instance, when

V_k = S_k

the representation in () is the compact formula for the BFGS recursion in ().

Specific Representations

Prior to the development of the compact representations of () and (), equivalent representations have been discovered for most known updates (see Table 1).

BFGS

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. In particular, the inverse representation is given by

H_k = H_0 + U_k M^_k U^T_k, \quad U_k = \begin S_k & H_0 Y_k \end, \quad  M^_k =
\left begin R^_k(D_k+Y^T_k H_0 Y_k) R^_k & -R^_k \\ -R^_k & 0 \end \right /math> The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian: B_k = B_0 + J_k N^_k J^T_k, \quad J_k = \begin B_0 S_k & Y_k \end, \quad  N_k =
\left begin S^T B_0 S_k & L_k \\ L^T_k & -D_k \end \right /math>

SR1

The SR1 (Symmetric Rank-1) compact representation was first proposed in. Using the definitions of

D_k, L_k

and

R_k

from above, the inverse Hessian formula is given by

H_k = H_0 + U_k M^_k U^T_k, \quad U_k = S_k-H_0 Y_k, \quad  M_k =
R_k+R^T_k-D_k-Y^T_k H_0 Y_k

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

B_k = B_0 + J_k N^_k J^T_k, \quad J_k = Y_k-B_0 S_k, \quad  N_k =
D_k+L_k+L^T_k-S^T_k B_0 S_k

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov. The compact representation for the direct Hessian was derived in

, \quad W_k = (S^T_k S_k)^

Another equivalent compact representation for the MSS matrix is derived by rewriting

J_k

in terms of

J_k = \begin S_k & B_0 Y_k \end

. The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

DFP

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping

H_k \leftrightarrow B_k

H_0 \leftrightarrow B_0

and

y_k \leftrightarrow s_k

in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS.

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation. It is equivalent to substituting

C_k = S_k

in ()

B_k = B_0 + J_k N^_k J^T_k, \quad J_k = \begin S_k & Y_k-B_0 S_k \end, \quad  
N_k =
\left begin 0 & R^_k \\ (R^_k)^T & R_k+R^T_k-(D_k + S^T_k B_0 S_k) \end \right

Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts

f(x) = \widehat(x) + \widehat(x)

, where the gradients and Hessian of

\widehat(x)

are known but only the gradient of

\widehat(x)

is known, structured BFGS formulas exist. The compact representation of these methods has the general form of (), with specific

J_k

and

N_k

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization

\text f(x) \text Ax = b

, where

A

is underdetermined. In addition to the matrices

S_k, Y_k

the RCR also stores the projections of the

y_i

's onto the nullspace of

A

KKT matrix has the compact representation K_k =
\begin
B_k & A^T \\
A & 0
\end,
\quad B_0 =  \fracI, \quad H_0 =  \gamma_k I, \quad \gamma_k > 0 \big(K^_k \big)_ = H_0 + U_k M^_k U^T_k, \quad U_k = \begin A^T & S_k & Z_k \end, \quad  M_k =
\left begin - AA^T / \gamma_k &  \\  & G_k \end \right \quad  
G_k =  \left begin R^_k(D_k+Y^T_k H_0 Y_k) R^_k & -H_0 R^_k \\ -H_0R^_k & 0 \end \right

Limited Memory

The most common use of the compact representations is for the limited-memory setting where

m \ll n

denotes the memory parameter, with typical values around

m \in,12 /math> (see e.g.,). Then, instead
of storing the history of all vectors one limits this to the m most recent vectors \_^and possibly \_^or \_^.
Further, typically the initialization is chosen as an adaptive multiple of the identity H^_k = \gamma_k I,
with \gamma_k = y^T_ s_ / y^T_ y_and B^_k = \frac I . Limited-memory methods are frequently used for 
large-scale problems with many variables (i.e., n can be large), in which the limited-memory matrices S_k \in \mathbb^and Y_k \in \mathbb^(and possibly V_k, C_k) are tall and very skinny: S_k = \begin s_ & \ldots & s_ \end and Y_k = \begin y_ & \ldots & y_ \end .

Implementations

Open source implementations include: * ACM TOMS algorithm 1030 implements a L-SR1 solver * R's optim general-purpose optimizer routine uses the L-BFGS-B method. *

SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...

's optimization module's minimize method also includes an option to use L-BFGS-B. *

IPOPT IPOPT, short for "Interior Point OPTimizer, pronounced I-P-Opt", is a software library for large scale nonlinear optimization of continuous systems. It is written in C++ (after migrating from Fortran and C) and is released under the EPL ( ...

with first order information Non open source implementations include: * Artelys Knitro nonlinear programming (NLP) solvers use compact quasi-Newton matrices * L-BFGS-B (ACM TOMS algorithm 778)

Works cited

{{reflist, 2 Optimization algorithms and methods Quasi-Newton methods