Mehrotra's predictor–corrector method in

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

is a specific

interior point method Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems. An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...

for

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...

. It was proposed in 1989 by Sanjay Mehrotra. The method is based on the fact that at each

iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

of an interior point algorithm it is necessary to compute the

Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effi ...

(factorization) of a large matrix to find the search direction. The factorization step is the most computationally expensive step in the algorithm. Therefore, it makes sense to use the same decomposition more than once before recomputing it. At each iteration of the algorithm, Mehrotra's predictor–corrector method uses the same Cholesky decomposition to find two different directions: a predictor and a corrector. The idea is to first compute an optimizing search direction based on a first order term (predictor). The step size that can be taken in this direction is used to evaluate how much centrality correction is needed. Then, a corrector term is computed: this contains both a centrality term and a second order term. The complete search direction is the sum of the predictor direction and the corrector direction. Although there is no theoretical complexity bound on it yet, Mehrotra's predictor–corrector method is widely used in practice."In 1989, Mehrotra described a practical algorithm for linear programming that remains the basis of most current software; his work appeared in 1992." Its corrector step uses the same

found during the predictor step in an effective way, and thus it is only marginally more expensive than a standard interior point algorithm. However, the additional overhead per iteration is usually paid off by a reduction in the number of iterations needed to reach an optimal solution. It also appears to converge very fast when close to the optimum.

Derivation

The derivation of this section follows the outline by Nocedal and Wright.

Predictor step - Affine scaling direction

A linear program can always be formulated in the standard form

\begin
&\underset&q(x) &= c^Tx,\\
&\text&Ax&=b,\\
           &\;& x&\geq0,
\end

where

c\in\mathbb^,\;A\in\mathbb^

and

b\in\mathbb^

define the problem with

m

constraints and

n

equations while

x\in\mathbb^

is a vector of variables. The Karush-Kuhn-Tucker (KKT) conditions for the problem are

\begin
A^T\lambda + s &= c,\;\;\;\text\\
Ax &= b,\;\;\;\text\\
XSe &= 0,\;\;\;\text\\
(x,s) &\geq 0,
\end

where

X=\text(x)

and

S=\text(s)

whence

e=(1,1,\dots,1)^T\in\mathbb^

. These conditions can be reformulated as a mapping

F: \mathbb^\rightarrow\mathbb^

as follows

\begin
F(x,\lambda,s) = \begin A^T\lambda+s-c\\Ax-b\\XSe\end &= 0\\
(x,s)&\geq0
\end

The predictor-corrector method then works by using Newton's method to obtain the affine scaling direction. This is achieved by solving the following system of linear equations

J(x,\lambda,s) \begin \Delta x^\text\\\Delta\lambda^\text \\\Delta s^\text\end = -F(x,\lambda,s)

where

J

, defined as

J(x,\lambda,s) = \begin \nabla_x F & \nabla_\lambda F & \nabla_s F\end,

is the Jacobian of F. Thus, the system becomes

\begin 0 & A^T & I \\ A & 0 & 0 \\ S & 0 & X \end\begin \Delta x^\text\\\Delta\lambda^\text \\\Delta s^\text\end = \begin-r_c\\-r_b\\-XSe\end,\;\;\; r_c = A^T\lambda+s-c,\;\;\; r_b = Ax-b

Centering step

The average value of the products

x_is_i,\;i=1,2,\dots,n

constitute an important measure of the desirability of a certain set

(x^k,s^k)

(the superscripts denote the value of the iteration number,

k

, of the method). This is called the duality measure and is defined by

\mu=\frac\sum_^n x_is_i = \frac.

For a value of the centering parameter,

\sigma\in,1

the centering step can be computed as the solution to

\begin 0 & A^T & I \\ A & 0 & 0 \\ S & 0 & X \end
\begin \Delta x^\text\\\Delta\lambda^\text \\\Delta s^\text\end
= \begin-r_c\\-r_b\\-XSe+\sigma\mu e\end

Corrector step

Considering the system used to compute the affine scaling direction defined in the above, one can note that taking a full step in the affine scaling direction results in the complementarity condition not being satisfied:

\left(x_i+\Delta x_i^\text\right)\left(s_i+\Delta s_i^\text\right) = x_is_i + x_i\Delta s_i^\text + s_i\Delta x_i^\text + \Delta x_i^\text\Delta s_i^\text = \Delta x_i^\text\Delta s_i^\text \ne 0.

As such, a system can be defined to compute a step that attempts to correct for this error. This system relies on the previous computation of the affine scaling direction.

\begin 0 & A^T & I \\ A & 0 & 0 \\ S & 0 & X \end
\begin \Delta x^\text\\\Delta\lambda^\text \\\Delta s^\text\end
= \begin0\\0\\-\Delta X^\text\Delta S^\texte\end

Aggregated system - Center-corrector direction

The predictor, corrector and centering contributions to the system right hand side can be aggregated into a single system. This system will depend on the previous computation of the affine scaling direction, however, the system matrix will be identical to that of the predictor step such that its factorization can be reused. The aggregated system is

\begin 0 & A^T & I \\ A & 0 & 0 \\ S & 0 & X \end
\begin \Delta x\\\Delta\lambda \\\Delta s\end
= \begin-r_c\\-r_b\\-XSe-\Delta X^\text\Delta S^\texte+\sigma\mu e\end

The predictor-corrector algorithm then first computes the affine scaling direction. Secondly, it solves the aggregated system to obtain the search direction of the current iteration.

Adaptive selection of centering parameter

The affine scaling direction can be used to define a heuristic to adaptively choose the centering parameter as

\sigma = \left(\frac\right)^3,

where

\begin
\mu_\text &= (x+\alpha^\text_\text\Delta x^\text)^T(s+\alpha^\text_\text\Delta s^\text)/n,\\
\alpha^\text_\text &= \min\left(1, \underset -\frac\right),\\
\alpha^\text_\text &= \min\left(1, \underset -\frac\right),
\end

Here,

\mu_\text

is the duality measure of the affine step and

\mu

is the duality measure of the previous iteration.

Step lengths

In practical implementations, a version of line search is performed to obtain the maximal step length that can be taken in the search direction without violating nonnegativity,

(x,s) \geq 0

Adaptation to Quadratic Programming

Although the modifications presented by Mehrotra were intended for interior point algorithms for linear programming, the ideas have been extended and successfully applied to

quadratic programming Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize (minimize or maximize) a multivariate quadratic function subject to linear constr ...

as well.

References

{{DEFAULTSORT:Mehrotra predictor-corrector method Optimization algorithms and methods