Karush–Kuhn–Tucker conditions
   HOME

TheInfoList



OR:

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in
nonlinear programming In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints or the objective function are nonlinear. An optimization problem is one of calculation of the extrema (maxima, minima or s ...
to be
optimal Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, provided that some regularity conditions are satisfied. Allowing inequality constraints, the KKT approach to nonlinear programming generalizes the method of
Lagrange multipliers In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
, which allows only equality constraints. Similar to the Lagrange approach, the constrained maximization (minimization) problem is rewritten as a Lagrange function whose optimal point is a saddle point, i.e. a global maximum (minimum) over the domain of the choice variables and a global minimum (maximum) over the multipliers, which is why the Karush–Kuhn–Tucker theorem is sometimes referred to as the saddle-point theorem. The KKT conditions were originally named after Harold W. Kuhn and Albert W. Tucker, who first published the conditions in 1951. Later scholars discovered that the necessary conditions for this problem had been stated by
William Karush William Karush (1 March 1917 – 22 February 1997) was an American professor of mathematics at California State University at Northridge and was a mathematician best known for his contribution to Karush–Kuhn–Tucker conditions. In his master's ...
in his master's thesis in 1939.


Nonlinear optimization problem

Consider the following nonlinear minimization or maximization problem: :optimize f(\mathbf) :subject to :: g_i(\mathbf) \leq 0, :: h_j(\mathbf) =0. where \mathbf \in \mathbf is the optimization variable chosen from a
convex subset In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex ...
of \mathbb^, f is the
objective Objective may refer to: * Objective (optics), an element in a camera or microscope * ''The Objective'', a 2008 science fiction horror film * Objective pronoun, a personal pronoun that is used as a grammatical object * Objective Productions, a Brit ...
or
utility As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
function, g_i \ (i = 1, \ldots, m) are the inequality constraint functions and h_j \ (j = 1, \ldots, \ell) are the equality constraint functions. The numbers of inequalities and equalities are denoted by m and \ell respectively. Corresponding to the constrained optimization problem one can form the Lagrangian function :\mathcal(\mathbf,\mathbf,\mathbf) = f(\mathbf) + \mathbf^\top \mathbf(\mathbf) + \mathbf^\top \mathbf(\mathbf)=L(\mathbf, \mathbf)=f(\mathbf)+\mathbf^\top \begin \mathbf(\mathbf) \\ \mathbf(\mathbf) \end where \mathbf(\mathbf) = \left( g_(\mathbf), \ldots, g_(\mathbf) \right)^\top, \mathbf(\mathbf) = \left( h_(\mathbf), \ldots, h_(\mathbf) \right)^\top. The Karush–Kuhn–Tucker theorem then states the following. Theorem. If (\mathbf^,\mathbf^\ast) is a saddle point of L(\mathbf,\mathbf) in \mathbf \in \mathbf, \mathbf \geq \mathbf, then \mathbf^ is an optimal vector for the above optimization problem. Suppose that f(\mathbf) and g_i(\mathbf), i = 1, \ldots, m, are
convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...
in \mathbf and that there exists \mathbf_ \in \mathbf such that \mathbf(\mathbf_) < 0. Then with an optimal vector \mathbf^ for the above optimization problem there is associated a non-negative vector \mathbf^\ast such that L(\mathbf^,\mathbf^\ast) is a saddle point of L(\mathbf,\mathbf). Since the idea of this approach is to find a
supporting hyperplane In geometry, a supporting hyperplane of a set S in Euclidean space \mathbb R^n is a hyperplane that has both of the following two properties: * S is entirely contained in one of the two closed half-spaces bounded by the hyperplane, * S has at leas ...
on the feasible set \mathbf = \left\, the proof of the Karush–Kuhn–Tucker theorem makes use of the
hyperplane separation theorem In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least one ...
. The system of equations and inequalities corresponding to the KKT conditions is usually not solved directly, except in the few special cases where a closed-form solution can be derived analytically. In general, many optimization algorithms can be interpreted as methods for numerically solving the KKT system of equations and inequalities.


Necessary conditions

Suppose that the
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
f : \mathbb^n \rightarrow \mathbb and the constraint functions g_i : \,\!\mathbb^n \rightarrow \mathbb and h_j : \,\!\mathbb^n \rightarrow \mathbb have subderivatives at a point x^* \in \mathbb^n. If x^* is a
local optimum In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions. This is in contrast to a global optimum, which ...
and the optimization problem satisfies some regularity conditions (see below), then there exist constants \mu_i\ (i = 1,\ldots,m) and \lambda_j\ (j = 1,\ldots,\ell), called KKT multipliers, such that the following four groups of conditions hold: ;Stationarity :For minimizing f(x): \partial f(x^*) + \sum_^\ell \lambda_j \partial h_j(x^*) + \sum_^m \mu_i \partial g_i(x^*) \ni \mathbf 0 :For maximizing f(x): -\partial f(x^*) + \sum_^\ell \lambda_j \partial h_j(x^*) + \sum_^m \mu_i \partial g_i(x^*) \ni \mathbf 0 ;Primal feasibility :h_j(x^*) = 0, \text j = 1, \ldots, \ell \,\! :g_i(x^*) \le 0, \text i = 1, \ldots, m ;Dual feasibility :\mu_i \ge 0, \text i = 1, \ldots, m ;Complementary slackness :\sum_^m \mu_i g_i (x^*) = 0. The last condition is sometimes written in the equivalent form: \mu_i g_i (x^*) = 0, \text i = 1, \ldots, m. In the particular case m=0, i.e., when there are no inequality constraints, the KKT conditions turn into the Lagrange conditions, and the KKT multipliers are called
Lagrange multipliers In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
.


Note on subdifferentials

Note that even if f is differentiable, its subdifferential does not necessarily need to be \. Indeed, \partial f(x) = \ if f is lower-bounded by its linear approximation at x, asf(x') \geq f(x) + \nabla f(x) (x' - x) and otherwise, \partial f(x) = \emptyset.


Proof


Interpretation: KKT conditions as balancing constraint-forces in state space

The primal problem can be interpreted as moving a particle in the space of x, and subjecting it to three kinds of force fields: * f is a potential field that the particle is minimizing. The force generated by f is -\partial f. * g_i are one-sided constraint surfaces. The particle is allowed to move inside g_i \leq 0, but whenever it touches g_i=0, it is pushed inwards. * h_j are two-sided constraint surfaces. The particle is allowed to move only on the surface h_j. Primal stationarity states that the "force" of \partial f(x^*) is exactly balanced by a linear sum of forces \partial h_j(x^*) and \partial g_i(x^*). Dual feasibility additionally states that all the \partial g_i(x^*) forces must be one-sided, pointing inwards into the feasible set for x. Dual slackness states that if g_i(x^*) < 0, then the \partial g_i(x^*) force must be zero, since the particle is not on the boundary, the one-sided constraint force cannot activate.


Matrix representation

The necessary conditions can be written with Jacobian matrices of the constraint functions. Let \mathbf g(x) : \,\!\mathbb^n \rightarrow \mathbb^m be defined as \mathbf(x) = \left( g_(x), \ldots, g_(x) \right)^\top and let \mathbf h(x) : \,\!\mathbb^n \rightarrow \mathbb^ be defined as \mathbf(x) = \left( h_(x), \ldots, h_(x) \right)^\top. Let \boldsymbol = \left( \mu_1, \ldots, \mu_m \right)^\top and \boldsymbol = \left( \lambda_1, \ldots, \lambda_ \right)^\top. Then the necessary conditions can be written as: ;Stationarity :For maximizing f(x): \partial f(x^*) - D \mathbf g(x^*)^ \boldsymbol - D \mathbf h(x^*)^ \boldsymbol = \mathbf 0 :For minimizing f(x): \partial f(x^*) + D \mathbf g(x^*)^ \boldsymbol + D \mathbf h(x^*)^ \boldsymbol = \mathbf 0 ;Primal feasibility :\mathbf g(x^*) \le \mathbf 0 :\mathbf h(x^*) = \mathbf 0 ;Dual feasibility :\boldsymbol \mu \ge \mathbf 0 ;Complementary slackness :\boldsymbol \mu^ \mathbf g (x^*) = 0.


Regularity conditions (or constraint qualifications)

One can ask whether a minimizer point x^* of the original, constrained optimization problem (assuming one exists) has to satisfy the above KKT conditions. This is similar to asking under what conditions the minimizer x^* of a function f(x) in an unconstrained problem has to satisfy the condition \nabla f(x^*)=0. For the constrained case, the situation is more complicated, and one can state a variety of (increasingly complicated) "regularity" conditions under which a constrained minimizer also satisfies the KKT conditions. Some common examples for conditions that guarantee this are tabulated in the following, with the LICQ the most frequently used one: The strict implications can be shown : LICQ ⇒ MFCQ ⇒ CPLD ⇒ QNCQ and : LICQ ⇒ CRCQ ⇒ CPLD ⇒ QNCQ In practice weaker constraint qualifications are preferred since they apply to a broader selection of problems.


Sufficient conditions

In some cases, the necessary conditions are also sufficient for optimality. In general, the necessary conditions are not sufficient for optimality and additional information is required, such as the Second Order Sufficient Conditions (SOSC). For smooth functions, SOSC involve the second derivatives, which explains its name. The necessary conditions are sufficient for optimality if the objective function f of a maximization problem is a concave function, the inequality constraints g_j are continuously differentiable convex functions and the equality constraints h_i are affine functions. Similarly, if the objective function f of a minimization problem is a convex function, the necessary conditions are also sufficient for optimality. It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so-called Type 1 invex functions.


Second-order sufficient conditions

For smooth, non-linear optimization problems, a second order sufficient condition is given as follows. The solution x^*, \lambda^*, \mu^* found in the above section is a constrained local minimum if for the Lagrangian, : L(x,\lambda,\mu) = f(x) + \sum_^m \mu_i g_i(x) + \sum_^\ell \lambda_j h_j(x) then, :s^T\nabla ^2_L(x^*,\lambda^*,\mu^*)s \ge 0 where s \ne 0 is a vector satisfying the following, :\left \nabla_x g_i(x^*), \nabla_x h_j(x^*) \rightT s = 0 where only those active inequality constraints g_i(x) corresponding to strict complementarity (i.e. where \mu_i > 0) are applied. The solution is a strict constrained local minimum in the case the inequality is also strict. If s^T\nabla ^2_L(x^*,\lambda^*,\mu^*)s = 0, the third order Taylor expansion of the Lagrangian should be used to verify if x^* is a local minimum. The minimization of f(x_1,x_2)=(x_2-x_1^2)(x_2-3x_1^2) is a good counter-example, see also
Peano surface In mathematics, the Peano surface is the graph of the two-variable function :f(x,y)=(2x^2-y)(y-x^2). It was proposed by Giuseppe Peano in 1899 as a counterexample to a conjectured criterion for the existence of maxima and minima of functions of tw ...
.


Economics

Often in mathematical economics the KKT approach is used in theoretical models in order to obtain qualitative results. For example,Chiang, Alpha C. ''Fundamental Methods of Mathematical Economics'', 3rd edition, 1984, pp. 750–752. consider a firm that maximizes its sales revenue subject to a minimum profit constraint. Letting Q be the quantity of output produced (to be chosen), R(Q) be sales revenue with a positive first derivative and with a zero value at zero output, C(Q) be production costs with a positive first derivative and with a non-negative value at zero output, and G_ be the positive minimal acceptable level of profit, then the problem is a meaningful one if the revenue function levels off so it eventually is less steep than the cost function. The problem expressed in the previously given minimization form is :Minimize -R(Q) :subject to : G_ \le R(Q) - C(Q) : Q \ge 0, and the KKT conditions are : \begin & \left(\frac \right) (1+\mu ) - \mu \left( \frac \right) \le 0, \\ pt& Q \ge 0, \\ pt& Q \left \left( \frac \right) (1+\mu ) - \mu \left( \frac\right) \right= 0, \\ pt& R(Q) - C(Q) - G_ \ge 0, \\ pt& \mu \ge 0, \\ pt& \mu (Q) - C(Q) - G_= 0. \end Since Q = 0 would violate the minimum profit constraint, we have Q > 0 and hence the third condition implies that the first condition holds with equality. Solving that equality gives : \frac = \frac \left( \frac \right). Because it was given that \text R / \text Q and \text C / \text Q are strictly positive, this inequality along with the non-negativity condition on \mu guarantees that \mu is positive and so the revenue-maximizing firm operates at a level of output at which
marginal revenue Marginal revenue (or marginal benefit) is a central concept in microeconomics that describes the additional total revenue generated by increasing product sales by 1 unit.Bradley R. chiller, "Essentials of Economics", New York: McGraw-Hill, Inc., ...
\text R / \text Q is less than marginal cost \text C / \text Q — a result that is of interest because it contrasts with the behavior of a profit maximizing firm, which operates at a level at which they are equal.


Value function

If we reconsider the optimization problem as a maximization problem with constant inequality constraints: : \text\; f(x) : \text\ :: g_i(x) \le a_i , h_j(x) = 0. The value function is defined as :V(a_1, \ldots, a_n) = \sup\limits_x f(x) : \text\ :: g_i(x) \le a_i , h_j(x) = 0 :: j \in \, i\in\, so the domain of V is \. Given this definition, each coefficient \mu_i is the rate at which the value function increases as a_i increases. Thus if each a_i is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function f. This interpretation is especially important in economics and is used, for instance, in
utility maximization problem Utility maximization was first developed by utilitarian philosophers Jeremy Bentham and John Stuart Mill. In microeconomics, the utility maximization problem is the problem consumers face: "How should I spend my money in order to maximize my u ...
s.


Generalizations

With an extra multiplier \mu_0\geq0, which may be zero (as long as (\mu_0,\mu,\lambda)\neq0), in front of \nabla f(x^*) the KKT stationarity conditions turn into : \begin & \mu_0 \,\nabla f(x^*) + \sum_^m \mu_i \,\nabla g_i(x^*) + \sum_^\ell \lambda_j \,\nabla h_j(x^*) = 0, \\ pt& \mu_jg_i(x^*)=0, \quad i=1,\dots,m, \end which are called the Fritz John conditions. This optimality conditions holds without constraint qualifications and it is equivalent to the optimality condition ''KKT or (not-MFCQ)''. The KKT conditions belong to a wider class of the first-order necessary conditions (FONC), which allow for non-smooth functions using
subderivative In mathematics, the subderivative, subgradient, and subdifferential generalize the derivative to convex functions which are not necessarily differentiable. Subderivatives arise in convex analysis, the study of convex functions, often in connectio ...
s.


See also

* Farkas' lemma *
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
* The
Big M method In operations research, the Big M method is a method of solving linear programming problems using the simplex algorithm. The Big M method extends the simplex algorithm to problems that contain "greater-than" constraints. It does so by associating ...
, for linear problems, which extends the
simplex algorithm In mathematical optimization, Dantzig's simplex algorithm (or simplex method) is a popular algorithm for linear programming. The name of the algorithm is derived from the concept of a simplex and was suggested by T. S. Motzkin. Simplices are n ...
to problems that contain "greater-than" constraints. *
Interior-point method Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems. An interior point method was discovered by Soviet mathematician I. I. Dikin in 1 ...
a method to solve the KKT conditions. *
Slack variable In an optimization problem, a slack variable is a variable that is added to an inequality constraint to transform it into an equality. Introducing a slack variable replaces an inequality constraint with an equality constraint and a non-negativity c ...


References


Further reading

* * * * * * * *


External links


Karush–Kuhn–Tucker conditions with derivation and examplesExamples and Tutorials on the KKT Conditions
{{DEFAULTSORT:Karush-Kuhn-Tucker conditions Mathematical optimization Mathematical economics