Compact Quasi-Newton Representation
   HOME

TheInfoList



OR:

The compact representation for
quasi-Newton methods In numerical analysis, a quasi-Newton method is an Iterative method, iterative numerical method used either to Root-finding algorithm, find zeroes or to Mathematical optimization, find local maxima and minima of functions via an iterative recurren ...
is a
matrix decomposition In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of ...
, which is typically used in
gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
based
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
or for solving
nonlinear systems In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathem ...
. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and
constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...
.


Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian H_k or direct Hessian B_k of a nonlinear
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
f(x):\mathbb^n \to \mathbb expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank-k or rank-2k update of an initial matrix. Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients \nabla f(x_k) = g_k in its definition \_^k . In particular, for r=k or r=2k the rectangular n \times r matrices U_k, J_k and the r \times r square symmetric systems M_k, N_k depend on the s_i,y_i's and define the quasi-Newton representations : H_k = H_0 + U_k M^_k U^T_k, \quad \text \quad B_k = B_0 + J_k N^_k J^T_k


Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software. When combined with limited-memory techniques it is a popular technique for
constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...
with gradients. Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and
trust region In mathematical optimization, a trust region is the subset of the region of the objective function that is approximated using a model function (often a quadratic). If an adequate model of the objective function is found within the trust region, th ...
techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector g \in \mathbb^n is: : \begin p^_k &= J^T_k g \\ \text \quad N_k p^_k &= p^_k \quad \quad \textN_k \text \\ p^_k &= J_k p^_k \\ p^_k &= H_0 g \\ p^_k &= p^_k + p^_k \end


Background

In the context of the
GMRES In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace wit ...
method, Walker showed that a product of
Householder transformation In linear algebra, a Householder transformation (also known as a Householder reflection or elementary reflector) is a linear transformation that describes a reflection (mathematics), reflection about a plane (mathematics), plane or hyperplane conta ...
s (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of k identity plus rank-1 matrices. Specifically, for S_k = \begin s_0 & s_1 & \ldots s_ \end, ~Y_k = \begin y_0 & y_1 & \ldots y_ \end, ~(R_k)_ = s^T_ y_, ~\rho_ = 1/s^T_ y_ and ~V_i = I - \rho_ y_ s^T_ when 1 \le i \le j \le k the product of k rank-1 updates to the identity is \prod_^k V_ = \left( I - \rho_ y_ s^T_ \right) \cdots \left( I - \rho_ y_ s^T_ \right) = I - Y_k R^_k S^T_k The BFGS update can be expressed in terms of products of the V_i 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations


Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas. For arbitrary vectors v_ and c_ such that v^T_y_\ne 0 and c^T_s_ \ne 0 general recursive update formulas for the inverse and direct Hessian estimates are By making specific choices for the parameter vectors v_ and c_ well known methods are recovered


Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define S_k = \begin s_0 & s_1 & \ldots & s_ \end, Y_k = \begin y_0 & y_1 & \ldots & y_ \end, V_k = \begin v_0 & v_1 & \ldots & v_ \end, C_k = \begin c_0 & c_1 & \ldots & c_ \end, upper triangular \big(R_k\big)_ := \big(R^_k\big)_ = s^T_y_, \quad \big(R^_k\big)_ = v^T_y_ , \quad \big(R^_k\big)_ = c^T_s_, \quad \quad \text 1 \le i \le j \le k lower triangular \big(L_k\big)_ := \big(L^_k\big)_ = s^T_y_, \quad \big(L^_k\big)_ = v^T_y_ , \quad \big(L^_k\big)_ = c^T_s_, \quad \quad \text 1 \le j < i \le k and diagonal (D_k)_ := \big(D^_k\big)_ = s^T_y_, \quad \quad \text 1 \le i = j \le k With these definitions the compact representations of general rank-2 updates in () and () (including the well known quasi-Newton updates in Table 1) have been developed in Brust: U_k = \begin V_k & S_k - H_0 Y_k \end M_k = \begin 0_ & R^_k \\ \big( R^_k \big)^T & R_k+R^T_k-(D_k+Y^T_k H_0 Y_k) \end and the formula for the direct Hessian is J_k = \begin C_k & Y_k - B_0 S_k \end N_k = \begin 0_ & R^_k \\ \big( R^_k \big)^T & R_k+R^T_k-(D_k+S^T_k B_0 S_k) \end For instance, when V_k = S_k the representation in () is the compact formula for the BFGS recursion in ().


Specific Representations

Prior to the development of the compact representations of () and (), equivalent representations have been discovered for most known updates (see Table 1).


BFGS

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. In particular, the inverse representation is given by H_k = H_0 + U_k M^_k U^T_k, \quad U_k = \begin S_k & H_0 Y_k \end, \quad M^_k = \left begin R^_k(D_k+Y^T_k H_0 Y_k) R^_k & -R^_k \\ -R^_k & 0 \end \right/math> The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian: B_k = B_0 + J_k N^_k J^T_k, \quad J_k = \begin B_0 S_k & Y_k \end, \quad N_k = \left begin S^T B_0 S_k & L_k \\ L^T_k & -D_k \end \right/math>


SR1

The SR1 (Symmetric Rank-1) compact representation was first proposed in. Using the definitions of D_k, L_k and R_k from above, the inverse Hessian formula is given by H_k = H_0 + U_k M^_k U^T_k, \quad U_k = S_k-H_0 Y_k, \quad M_k = R_k+R^T_k-D_k-Y^T_k H_0 Y_k The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form B_k = B_0 + J_k N^_k J^T_k, \quad J_k = Y_k-B_0 S_k, \quad N_k = D_k+L_k+L^T_k-S^T_k B_0 S_k


MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov. The compact representation for the direct Hessian was derived in B_k = B_0 + J_k N^_k J^T_k, \quad J_k = \begin S_k & Y_k - B_0 S_k \end, \quad N_k = \left begin W_k(S^T_k B_0 S_k - (R_k - D_k +R^T_k))W_k & W_k \\ W_k & 0 \end \right, \quad W_k = (S^T_k S_k)^ Another equivalent compact representation for the MSS matrix is derived by rewriting J_k in terms of J_k = \begin S_k & B_0 Y_k \end. The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.


DFP

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping H_k \leftrightarrow B_k , H_0 \leftrightarrow B_0 and y_k \leftrightarrow s_k in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS.


PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation. It is equivalent to substituting C_k = S_k in () B_k = B_0 + J_k N^_k J^T_k, \quad J_k = \begin S_k & Y_k-B_0 S_k \end, \quad N_k = \left begin 0 & R^_k \\ (R^_k)^T & R_k+R^T_k-(D_k + S^T_k B_0 S_k) \end \right


Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts f(x) = \widehat(x) + \widehat(x) , where the gradients and Hessian of \widehat(x) are known but only the gradient of \widehat(x) is known, structured BFGS formulas exist. The compact representation of these methods has the general form of (), with specific J_k and N_k.


Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization \text f(x) \text Ax = b , where A is underdetermined. In addition to the matrices S_k, Y_k the RCR also stores the projections of the y_i's onto the nullspace of A KKT matrix has the compact representation K_k = \begin B_k & A^T \\ A & 0 \end, \quad B_0 = \fracI, \quad H_0 = \gamma_k I, \quad \gamma_k > 0 \big(K^_k \big)_ = H_0 + U_k M^_k U^T_k, \quad U_k = \begin A^T & S_k & Z_k \end, \quad M_k = \left begin - AA^T / \gamma_k & \\ & G_k \end \right \quad G_k = \left begin R^_k(D_k+Y^T_k H_0 Y_k) R^_k & -H_0 R^_k \\ -H_0R^_k & 0 \end \right


Limited Memory

The most common use of the compact representations is for the limited-memory setting where m \ll n denotes the memory parameter, with typical values around m \in ,12/math> (see e.g.,). Then, instead of storing the history of all vectors one limits this to the m most recent vectors \_^ and possibly \_^ or \_^ . Further, typically the initialization is chosen as an adaptive multiple of the identity H^_k = \gamma_k I , with \gamma_k = y^T_ s_ / y^T_ y_ and B^_k = \frac I . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., n can be large), in which the limited-memory matrices S_k \in \mathbb^ and Y_k \in \mathbb^ (and possibly V_k, C_k ) are tall and very skinny: S_k = \begin s_ & \ldots & s_ \end and Y_k = \begin y_ & \ldots & y_ \end .


Implementations

Open source implementations include: * ACM TOMS algorithm 1030 implements a L-SR1 solver * R's optim general-purpose optimizer routine uses the L-BFGS-B method. *
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...
's optimization module's minimize method also includes an option to use L-BFGS-B. *
IPOPT IPOPT, short for "Interior Point OPTimizer, pronounced I-P-Opt", is a software library for large scale nonlinear optimization of continuous systems. It is written in C++ (after migrating from Fortran and C) and is released under the EPL ( ...
with first order information Non open source implementations include: * Artelys Knitro nonlinear programming (NLP) solvers use compact quasi-Newton matrices * L-BFGS-B (ACM TOMS algorithm 778)


Works cited

{{reflist, 2 Optimization algorithms and methods Quasi-Newton methods