Self-concordant Barrier
   HOME

TheInfoList



OR:

A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
using
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in
interior point method Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...
s for optimization.


Self-concordant functions


Multivariate self-concordant function

Here is the general definition of a self-concordant function. Let ''C'' be a convex nonempty open set in R''n''. Let ''f'' be a function that is three-times continuously differentiable defined on ''C''. We say that f is self-concordant on ''C'' if it satisfies the following properties: 1. ''Barrier property'': on any sequence of points in ''C'' that converges to a boundary point of ''C'', ''f'' converges to ∞. 2. ''Differential inequality'': for every point x in ''C'', and any direction h in R''n'', let ''g''h be the function ''f'' restricted to the direction h, that is: ''g''h(''t'') = ''f''(x+t*h). Then the one-dimensional function ''g''h should satisfy the following differential inequality:
, g_h(x), \leq 2 g_h''(x)^.
Equivalently:
\left. \frac \nabla^2 f(x + \alpha y) \_ \preceq 2 \sqrt \, \nabla^2 f(x)


Univariate self-concordant function

A
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...
f:\mathbb \rightarrow \mathbb is self-concordant on \mathbb if: : , f(x), \leq 2 f''(x)^ Equivalently: if wherever f''(x) > 0 it satisfies: : \left, \frac \frac \ \leq 1 and satisfies f(x) = 0 elsewhere.


Examples

* Linear and convex quadratic functions are self-concordant, since their third derivative is zero. * Any function f(x) = -\log(-g(x))-\log x where g(x) is defined and convex for all x > 0 and verifies , g(x) , \leq 3g''(x)/x, is self concordant on its domain which is \. Some examples are **g(x) = -x^p for 0 < p \leq 1 ** g(x) = -\log x ** g(x) = x^p for -1 \leq p \leq 0 ** g(x) = (ax+b)^2 / x ** for any function g satisfying the conditions, the function g(x) + a x^2 + bx + c with a \geq 0 also satisfies the conditions. Some functions that are not self-concordant: * f(x) = e^x * f(x) = \frac, x >0, p >0 * f(x) = , x^p, , p > 2


Self-concordant barriers

Here is the general definition of a self-concordant barrier (SCB). Let ''C'' be a convex closed set in R''n'' with a non-empty interior. Let ''f'' be a function from interior(''C'') to R. Let ''M''>0 be a real parameter. We say that ''f'' is a ''M''-self-concordant barrier for ''C'' if it satisfies the following: 1. ''f'' is a self-concordant function on interior(''C''). 2. For every point x in interior(''C''), and any direction h in R''n'', let ''g''h be the function ''f'' restricted to the direction h, that is: ''g''h(''t'') = ''f''(x+t*h). Then the one-dimensional function ''g''h should satisfy the following differential inequality:
, g_h'(x), \leq M^\cdot g_h''(x)^.


Constructing SCBs

Due to the importance of SCBs in interior-point methods, it is important to know how to construct SCBs for various domains. In theory, it can be proved that ''every'' closed convex domain in Rn has a self-concordant barrier with parameter O(''n''). But this “universal barrier” is given by some multivariate integrals, and it is too complicated for actual computations. Hence, the main goal is to construct SCBs that are efficiently computable. SCBs can be constructed from some ''basic SCBs'', that are combined to produce SCBs for more complex domains, using several ''combination rules''.


Basic SCBs

Every constant is a self-concordant barrier for all R''n'', with parameter M=0. It is the only self-concordant barrier for the entire space, and the only self-concordant barrier with ''M'' < 1. ote that linear and quadratic functions are self-concordant functions, but they are ''not'' self concordant barriers For the positive half-line \mathbb R_+(x > 0), f(x) = -\ln x is a self-concordant barrier with parameter M = 1. This can be proved directly from the definition.


Substitution rule

Let ''G'' be a closed convex domain in ''Rn'', and ''g'' an ''M''-SCB for ''G''. Let ''x'' = ''Ay''+''b'' be an affine mapping from Rk to Rn with its image intersecting the interior of ''G''. Let ''H'' be the inverse image of ''G'' under the mapping: ''H'' = . Let ''h'' be the composite function ''h''(''y'') := g(''Ay''+''b''). Then, ''h'' is an ''M''-SCB for ''H''. For example, take ''n''=1, ''G'' the positive half-line, and g(x) = -\ln x. For any ''k'', let ''a'' be a ''k''-element vector and ''b'' a scalar. Let ''H'' = = a ''k''-dimensional half-space. By the substitution rule, h(y) = -\ln (a^T y+b) is a 1-SCB for ''H''. A more common format is ''H'' = , for which the SCB is h(y) = -\ln (b - a^T y). The substitution rule can be extended from affine mappings to a certain class of "appropriate" mappings, and to quadratic mappings.


Cartesian product rule

For all ''i'' in 1,...,''m'', let ''Gi'' be a closed convex domains in ''Rni'', and let ''gi'' be an ''M''i-SCB for ''Gi''. Let ''G'' be the
cartesian product In mathematics, specifically set theory, the Cartesian product of two sets and , denoted , is the set of all ordered pairs where is an element of and is an element of . In terms of set-builder notation, that is A\times B = \. A table c ...
of all ''Gi''. Let ''g(x''1'',...,xm)'' := sum''i gi''(''xi''). Then, ''g'' is a SCB for ''G'', with parameter sum''i Mi''. For example, take all ''Gi'' to be the positive half-line, so that ''G'' is the positive orthant \mathbb R_+^m. Let g(x) = -\sum_^m \ln x_i is an ''m''-SCB for ''G.'' We can now apply the substitution rule. We get that, for the polytope defined by the linear inequalities ''aj''T''x'' ≤ ''bj'' for ''j'' in 1,...,''m'', if it satisfies
Slater's condition In mathematics, Slater's condition (or Slater condition) is a sufficient condition for strong duality to hold for a convex optimization problem, named after Morton L. Slater. Informally, Slater's condition states that the feasible region must ...
, then f(x) = -\sum_^m \ln (b_j-a_j^T x) is an ''m''-SCB. The linear functions b_j-a_j^T x can be replaced by
quadratic function In mathematics, a quadratic function of a single variable (mathematics), variable is a function (mathematics), function of the form :f(x)=ax^2+bx+c,\quad a \ne 0, where is its variable, and , , and are coefficients. The mathematical expression, e ...
s.


Intersection rule

Let ''G''1,...,''Gm'' be closed convex domains in ''Rn''. For each ''i'' in 1,...,''m'', let ''gi'' be an ''M''i-SCB for ''Gi'', and ''ri'' a real number. Let ''G'' be the intersection of all ''Gi'', and suppose its interior is nonempty. Let ''g'' := sum''i ri*gi''. Then, ''g'' is a SCB for ''G'', with parameter sum''i ri*Mi''. Therefore, if ''G'' is defined by a list of constraints, we can find a SCB for each constraint separately, and then simply sum them to get a SCB for ''G''. For example, suppose the domain is defined by ''m'' linear constraints of the form ''ajTx'' ≤ ''bj'', for ''j'' in 1,...,''m''. Then we can use the Intersection rule to construct the ''m''-SCB f(x) = -\sum_^m \ln (b_j-a_j^T x) (the same one that we previously computed using the Cartesian product rule).


SCBs for epigraphs

The epigraph of a function ''f''(''x'') is the area above the graph of the function, that is, \ . The epigraph of ''f'' is a
convex set In geometry, a set of points is convex if it contains every line segment between two points in the set. For example, a solid cube (geometry), cube is a convex set, but anything that is hollow or has an indent, for example, a crescent shape, is n ...
if and only if ''f'' is a
convex function In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of a function, graph of the function lies above or on the graph between the two points. Equivalently, a function is conve ...
. The following theorems present some functions ''f'' for which the epigraph has an SCB. Let ''g''(''t'') be a 3-times continuously-differentiable concave function on ''t''>0, such that t\cdot , g(t), / , g''(t), is bounded by a constant (denoted 3*''b'') for all ''t''>0. Let ''G'' be the 2-dimensional convex domain: G=\text(\). Then, the function ''f''(''x'',''t'') = -ln(f(t)-x) - max ,b2ln(t) is a self-concordant barrier for ''G'', with parameter (1+max ,b2. Examples: * Let ''g''(''t'') = ''t''1/''p'', for some ''p''≥1, and ''b''=(2''p''-1)/(3''p''). Then G_1=\ has a 2-SCB. Similarly, G_2=\ has a 2-SCB. Using the Intersection rule, we get that G = G_1\cap G_2= \ has a 4-SCB. * Let ''g''(''t'')=ln(''t'') and ''b''=2/3. Then G=\ has a 2-SCB. We can now construct a SCB for the problem of minimizing the ''p''-norm: \min_x \sum_^n , v_j - x^T u_j, ^p , where ''vj'' are constant scalars, ''uj'' are constant vectors, and ''p''>0 is a constant. We first convert it into minimization of a linear objective: \min_x \sum_^n t_j , with the constraints: t_j \geq , v_j - x^T u_j, ^p for all ''j'' in 'm'' For each constraint, we have a 4-SCB by the affine substitution rule. Using the Intersection rule, we get a (4''n'')-SCB for the entire feasible domain. Similarly, let ''g'' be a 3-times continuously-differentiable convex function on the ray ''x''>0, such that: x\cdot , g(x), / , g''(x), \leq 3 b for all ''x''>0. Let ''G'' be the 2-dimensional convex domain: closure(). Then, the function ''f''(''x'',''t'') = -ln(t-f(x)) - max ,b2ln(x) is a self-concordant barrier for G, with parameter (1+max ,b2. Examples: * Let ''g''(''x'') = ''x''−''p'', for some ''p''>0, and b=(2+''p'')/3. Then G_1=\ has a 2-SCB. * Let ''g''(''x'')=''x'' ln(''x'') and ''b''=1/3. Then G=\ has a 2-SCB.


SCBs for cones

* For the second order cone \, the function f(x,y) = -\log(y^2 - x^T x) is a self-concordant barrier. * For the cone of positive semidefinite of m*m symmetric matrices, the function f(A) = - \log \det A is a self-concordant barrier. * For the quadratic region defined by \phi(x) > 0 where \phi(x) = \alpha +\langle a, x \rangle - \frac \langle Ax, x \rangle where A = A^T \geq 0 is a positive semi-definite symmetric matrix, the logarithmic barrier f(x) = -\log \phi(x) is self-concordant with M = 2 * For the exponential cone \, the function f(x,y,z) = -\log (y \log(z/y) - x) - \log z - \log y is a self-concordant barrier. * For the
power cone In linear algebra, a power cone is a kind of a convex cone that is particularly important in modeling convex optimization problems. It is a generalization of the quadratic cone: the quadratic cone is defined using a quadratic equation (with the p ...
\, the function f(x_1,x_2,y) = -\log(x_1^ x_2^ - y^2) - \log x_1 - \log x_2 is a self-concordant barrier.


History

As mentioned in the "Bibliography Comments" of their 1994 book, self-concordant functions were introduced in 1988 by
Yurii Nesterov Yurii Nesterov is a Russian mathematician, an internationally recognized expert in convex optimization, especially in the development of efficient algorithms and numerical optimization analysis. He is currently a professor at the University of L ...
and further developed with
Arkadi Nemirovski Arkadi Nemirovski (; born March 14, 1947) is a professor at the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology. He has been a leader in continuous optimization and is best known for his wo ...
. As explained in their basic observation was that the Newton method is affine invariant, in the sense that if for a function f(x) we have Newton steps x_ = x_k - ''(x_k)f'(x_k) then for a function \phi(y) = f(Ay) where A is a non-degenerate linear transformation, starting from y_0 = A^ x_0 we have the Newton steps y_k = A^ x_k which can be shown recursively : y_ = y_k - phi''(y_k) \phi'(y_k) = y_k - ^T f''(A y_k) A A^T f'(A y_k) = A^ x_k - A^ ''(x_k) f'(x_k) = A^ x_. However, the standard analysis of the Newton method supposes that the Hessian of f is
Lipschitz continuous In mathematical analysis, Lipschitz continuity, named after Germany, German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for function (mathematics), functions. Intuitively, a Lipschitz continuous function is limited in h ...
, that is \, f''(x) - f''(y)\, \leq M\, x-y \, for some constant M. If we suppose that f is 3 times continuously differentiable, then this is equivalent to : , \langle f(x) , v \rangle , \leq M \, u\, \, v\, ^2for all u,v \in \mathbb^n where f(x) = \lim_ \alpha^ ''(x + \alpha u) - f''(x)/math> . Then the left hand side of the above inequality is invariant under the affine transformation f(x) \to \phi(y) = f(A y), u \to A^ u, v \to A^ v, however the right hand side is not. The authors note that the right hand side can be made also invariant if we replace the Euclidean metric by the scalar product defined by the Hessian of f defined as \, w \, _ = \langle f''(x)w, w \rangle^ for w \in \mathbb R^n. They then arrive at the definition of a self concordant function as : , \langle f(x) , u \rangle , \leq M \langle f''(x) u, u \rangle^.


Properties


Linear combination

If f_1 and f_2 are self-concordant with constants M_1 and M_2 and \alpha,\beta>0, then \alpha f_1 + \beta f_2 is self-concordant with constant \max(\alpha^ M_1, \beta^ M_2).


Affine transformation

If f is self-concordant with constant M and Ax + b is an affine transformation of \mathbb R^n, then \phi(x) = f(Ax+b) is also self-concordant with parameter M.


Convex conjugate

If f is self-concordant, then its
convex conjugate In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformati ...
f^* is also self-concordant.


Non-singular Hessian

If f is self-concordant and the domain of f contains no straight line (infinite in both directions), then f'' is non-singular. Conversely, if for some x in the domain of f and u \in \mathbb R^n, u \neq 0 we have \langle f''(x) u, u \rangle = 0, then \langle f''(x + \alpha u) u, u \rangle = 0 for all \alpha for which x + \alpha u is in the domain of f and then f(x + \alpha u) is linear and cannot have a maximum so all of x + \alpha u, \alpha \in \mathbb R is in the domain of f. We note also that f cannot have a minimum inside its domain.


Applications

Among other things, self-concordant functions are useful in the analysis of
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
. Self-concordant ''barrier functions'' are used to develop the barrier functions used in
interior point method Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...
s for convex and nonlinear optimization. The usual analysis of the Newton method would not work for barrier functions as their second derivative cannot be Lipschitz continuous, otherwise they would be bounded on any compact subset of \mathbb R^n. Self-concordant barrier functions * are a class of functions that can be used as barriers in constrained optimization methods * can be minimized using the Newton algorithm with provable convergence properties analogous to the usual case (but these results are somewhat more difficult to derive) * to have both of the above, the usual constant bound on the third derivative of the function (required to get the usual convergence results for the Newton method) is replaced by a bound relative to the Hessian


Minimizing a self-concordant function

A self-concordant function may be minimized with a modified Newton method where we have a bound on the number of steps required for convergence. We suppose here that f is a ''standard'' self-concordant function, that is it is self-concordant with parameter M = 2. We define the ''Newton decrement'' \lambda_f(x) of f at x as the size of the Newton step ''(x) f'(x) in the local norm defined by the Hessian of f at x :\lambda_f(x) = \langle f''(x) ''(x) f'(x), ''(x) f'(x) \rangle^ = \langle ''(x) f'(x), f'(x) \rangle^ Then for x in the domain of f, if \lambda_f(x) < 1 then it is possible to prove that the Newton iterate : x_+ = x - ''(x)f'(x) will be also in the domain of f. This is because, based on the self-concordance of f, it is possible to give some finite bounds on the value of f(x_+). We further have : \lambda_f(x_+) \leq \Bigg( \frac \Bigg)^2 Then if we have : \lambda_f(x) < \bar\lambda = \frac then it is also guaranteed that \lambda_f(x_+) < \lambda_f(x), so that we can continue to use the Newton method until convergence. Note that for \lambda_f(x_+) < \beta for some \beta \in (0, \bar\lambda) we have quadratic convergence of \lambda_f to 0 as \lambda_f(x_+) \leq (1-\beta)^ \lambda_f(x)^2. This then gives quadratic convergence of f(x_k) to f(x^*) and of x to x^*, where x^* = \arg\min f(x), by the following theorem. If \lambda_f(x) < 1 then : \omega(\lambda_f(x)) \leq f(x)-f(x^*) \leq \omega_*(\lambda_f(x)) : \omega'(\lambda_f(x)) \leq \, x-x^* \, _x \leq \omega_*'(\lambda_f(x)) with the following definitions : \omega(t) = t - \log(1+t) : \omega_*(t) = -t-\log(1-t) : \, u \, _x = \langle f''(x) u, u \rangle^ If we start the Newton method from some x_0 with \lambda_f(x_0) \geq \bar\lambda then we have to start by using a ''damped Newton method'' defined by :x_ = x_k - \frac ''(x_k)f'(x_k) For this it can be shown that f(x_) \leq f(x_k) - \omega(\lambda_f(x_k)) with \omega as defined previously. Note that \omega(t) is an increasing function for t > 0 so that \omega(t) \geq \omega(\bar\lambda) for any t \geq \bar\lambda, so the value of f is guaranteed to decrease by a certain amount in each iteration, which also proves that x_ is in the domain of f.


References

{{Reflist Functions and mappings