Within computer science and operations research, many

combinatorial optimization Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combin ...

problems are computationally intractable to solve exactly (to optimality). Many such problems do admit fast ( polynomial time)

approximation algorithms In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned s ...

—that is, algorithms that are guaranteed to return an approximately optimal solution given any input. Randomized rounding is a widely used approach for designing and analyzing such

. The basic idea is to use the probabilistic method to convert an optimal solution of a relaxation of the problem into an approximately optimal solution to the original problem.

Overview

The basic approach has three steps: # Formulate the problem to be solved as an integer linear program (ILP). # Compute an optimal fractional solution

x

to the

linear programming relaxation In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable. For example, in a 0–1 integer program, all constraints are of the form :x_i\in\. The relax ...

(LP) of the ILP. # Round the fractional solution

x

of the LP to an integer solution

x'

of the ILP. (Although the approach is most commonly applied with linear programs, other kinds of relaxations are sometimes used. For example, see Goemans' and Williamson's semidefinite programming-based Max-Cut approximation algorithm.) The challenge in the first step is to choose a suitable integer linear program. Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well, such as in the Set Cover example below. (The integer linear program should have a small

integrality gap In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable. For example, in a 0–1 integer program, all constraints are of the form :x_i\in\. The relax ...

; indeed randomized rounding is often used to prove bounds on integrality gaps.) In the second step, the optimal fractional solution can typically be computed in polynomial time using any standard

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...

algorithm. In the third step, the fractional solution must be converted into an integer solution (and thus a solution to the original problem). This is called ''rounding'' the fractional solution. The resulting integer solution should (provably) have cost not much larger than the cost of the fractional solution. This will ensure that the cost of the integer solution is not much larger than the cost of the optimal integer solution. The main technique used to do the third step (rounding) is to use randomization, and then to use probabilistic arguments to bound the increase in cost due to the rounding (following the probabilistic method from combinatorics). Therein, probabilistic arguments are used to show the existence of discrete structures with desired properties. In this context, one uses such arguments to show the following: : ''Given any fractional solution

x

of the LP, with positive probability the randomized rounding process produces an integer solution

x'

that approximates

x

'' according to some desired criterion. Finally, to make the third step computationally efficient, one either shows that

x'

approximates

x

with high probability (so that the step can remain randomized) or one derandomizes the rounding step, typically using the method of conditional probabilities. The latter method converts the randomized rounding process into an efficient deterministic process that is guaranteed to reach a good outcome.

Comparison to other applications of the probabilistic method

The randomized rounding step differs from most applications of the probabilistic method in two respects: # The

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations ...

of the rounding step is important. It should be implementable by a fast (e.g. polynomial time)

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

. # The probability distribution underlying the random experiment is a function of the solution

x

of a relaxation of the problem instance. This fact is crucial to proving the performance guarantee of the approximation algorithm --- that is, that for any problem instance, the algorithm returns a solution that approximates the ''optimal solution for that specific instance''. In comparison, applications of the probabilistic method in combinatorics typically show the existence of structures whose features depend on other parameters of the input. For example, consider Turán's theorem, which can be stated as "any graph with

n

vertices of average degree

d

must have an independent set of size at least

n/(d+1)

. (See this for a probabilistic proof of Turán's theorem.) While there are graphs for which this bound is tight, there are also graphs which have independent sets much larger than

n/(d+1)

. Thus, the size of the independent set shown to exist by Turán's theorem in a graph may, in general, be much smaller than the maximum independent set for that graph.

Set cover example

The following example illustrates how randomized rounding can be used to design an approximation algorithm for the Set Cover problem. Fix any instance

\langle c, \mathcal S\rangle

of set cover over a universe

\mathcal U

. For step 1, let IP be the standard integer linear program for set cover for this instance. For step 2, let LP be the

of IP, and compute an optimal solution

x^*

to LP using any standard

algorithm. (This takes time polynomial in the input size.) (The feasible solutions to LP are the vectors

x

that assign each set

s \in\mathcal S

a non-negative weight

x_s

, such that, for each element

e\in\mathcal U

x'

''covers''

e

-- the total weight assigned to the sets containing

e

is at least 1, that is, ::

\sum_ x_s \ge 1.

The optimal solution

x^*

is a feasible solution whose cost ::

\sum_ c(S)x^*_s

is as small as possible.) ---- Note that any set cover

\mathcal C

for

\mathcal S

gives a feasible solution

x

(where

x_s=1

for

s\in\mathcal C

x_s=0

otherwise). The cost of this

\mathcal C

equals the cost of

x

, that is, ::

\sum_ c(s) = \sum_ c(s) x_s.

In other words, the linear program LP is a relaxation of the given set-cover problem. Since

x^*

has minimum cost among feasible solutions to the LP, ''the cost of

x^*

is a lower bound on the cost of the optimal set cover''.

Step 3: The randomized rounding step

Here is a description of the third step—the rounding step, which must convert the minimum-cost fractional set cover

x^*

into a feasible integer solution

x'

(corresponding to a true set cover). The rounding step should produce an

x'

that, with positive probability, has cost within a small factor of the cost of

x^*

. Then (since the cost of

x^*

is a lower bound on the cost of the optimal set cover), the cost of

x'

will be within a small factor of the optimal cost. As a starting point, consider the most natural rounding scheme: :: ''For each set

s\in\mathcal S

in turn, take

x'_s = 1

with probability

\min(1,x^*_s)

, otherwise take

x'_s = 0

.'' With this rounding scheme, the expected cost of the chosen sets is at most

\sum_s c(s) x^*_s

, the cost of the fractional cover. This is good. Unfortunately the coverage is not good. When the variables

x^*_s

are small, the probability that an element

e

is not covered is about :

\prod_ 1-x^*_s
\approx
\prod_ \exp(-x^*_s)
=
\exp\Big(-\sum_x^*_s\Big)
\approx \exp(-1).

So only a constant fraction of the elements will be covered in expectation. To make

x'

cover every element with high probability, the standard rounding scheme first ''scales up'' the rounding probabilities by an appropriate factor

\lambda > 1

. Here is the standard rounding scheme: :: ''Fix a parameter

\lambda \ge 1

. For each set

s\in\mathcal S

in turn,'' :: ''take

x'_s = 1

with probability

\min(\lambda x^*_s, 1)

, otherwise take

x'_s = 0

.'' Scaling the probabilities up by

\lambda

increases the expected cost by

\lambda

, but makes coverage of all elements likely. The idea is to choose

\lambda

as small as possible so that all elements are provably covered with non-zero probability. Here is a detailed analysis. ----

Lemma (approximation guarantee for rounding scheme)

:: ''Fix

\lambda = \ln (2, \mathcal U, )

. With positive probability, the rounding scheme returns a set cover

x'

of cost at most

2\ln(2, \mathcal U, ) c\cdot x^*

(and thus of cost

O(\log , \mathcal U, )

times the cost of the optimal set cover).'' (Note: with care the

O(\log , \mathcal U, )

can be reduced to

\ln(, \mathcal U, )+O(\log\log, \mathcal U, )

Proof

The output

x'

of the random rounding scheme has the desired properties as long as none of the following "bad" events occur: # the cost

c\cdot x'

x'

exceeds

2\lambda c\cdot x^*

, or # for some element

e

x'

fails to cover

e

. The expectation of each

x'_s

is at most

\lambda x_s^*

. By linearity of expectation, the expectation of

c\cdot x'

is at most

\sum_s c(s)\lambda x_s^*=\lambda c\cdot x^*

. Thus, by Markov's inequality, the probability of the first bad event above is at most

1/2

. For the remaining bad events (one for each element

e

), note that, since

\sum_ x^*_s \ge 1

for any given element

e

, the probability that

e

is not covered is :

\begin
\prod_ \big(1-\min(\lambda x^*_s,1) \big)
& < \prod_ \exp(\lambda x^*_s)
= \exp\Big(\lambda \sum_ x^*_s \Big)
\\ 
& \le \exp(\lambda)
= 1/(2, \mathcal U, ).
\end

(This uses the inequality

1+z\le e^z

, which is strict for

z \ne 0

.) Thus, for each of the

, \mathcal U,

elements, the probability that the element is not covered is less than

1/(2\mathcal U)

. By the

union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the indivi ...

, the probability that one of the

1+, \mathcal U,

bad events happens is less than

1/2 + , \mathcal U, /(2\mathcal U)=1

. Thus, with positive probability there are no bad events and

x'

is a set cover of cost at most

2\lambda c\cdot x^*

. QED

Derandomization using the method of conditional probabilities

The lemma above shows the ''existence'' of a set cover of cost

O(\log(, \mathcal U, )c\cdot x^*

). In this context our goal is an efficient approximation algorithm, not just an existence proof, so we are not done. One approach would be to increase

\lambda

a little bit, then show that the probability of success is at least, say, 1/4. With this modification, repeating the random rounding step a few times is enough to ensure a successful outcome with high probability. That approach weakens the approximation ratio. We next describe a different approach that yields a deterministic algorithm that is guaranteed to match the approximation ratio of the existence proof above. The approach is called the method of conditional probabilities. The deterministic algorithm emulates the randomized rounding scheme: it considers each set

s\in\mathcal S

in turn, and chooses

x'_s \in\

. But instead of making each choice ''randomly'' based on

x^*

, it makes the choice ''deterministically'', so as to ''keep the conditional probability of failure, given the choices so far, below 1''.

Bounding the conditional probability of failure

We want to be able to set each variable

x'_s

in turn so as to keep the conditional probability of failure below 1. To do this, we need a good bound on the conditional probability of failure. The bound will come by refining the original existence proof. That proof implicitly bounds the probability of failure by the expectation of the random variable :

F = \frac + , \mathcal U^,

, where :

\mathcal U^= \Big\

is the set of elements left uncovered at the end. The random variable

F

may appear a bit mysterious, but it mirrors the probabilistic proof in a systematic way. The first term in

F

comes from applying Markov's inequality to bound the probability of the first bad event (the cost is too high). It contributes at least 1 to

F

if the cost of

x'

is too high. The second term counts the number of bad events of the second kind (uncovered elements). It contributes at least 1 to

F

x'

leaves any element uncovered. Thus, in any outcome where

F

is less than 1,

x'

must cover all the elements and have cost meeting the desired bound from the lemma. In short, if the rounding step fails, then

F \ge 1

. This implies (by Markov's inequality) that ''

E /math> is an upper bound on the probability of failure.''
Note that the argument above is implicit already in the proof of the lemma,
which also shows by calculation that E < 1 .

To apply the method of conditional probabilities,
we need to extend the argument to bound the ''conditional'' probability of failure
as the rounding step proceeds.
Usually, this can be done in a systematic way,
although it can be technically tedious.

So, what about the ''conditional'' probability of failure as the rounding step iterates through the sets?
Since F \ge 1 in any outcome where the rounding step fails,
by Markov's inequality, the ''conditional'' probability of failure
is at most the ''conditional'' expectation of F .

Next we calculate the conditional expectation of F,
much as we calculated the unconditioned expectation of F in the original proof.
Consider the state of the rounding process at the end of some iteration t .
Let S^denote the sets considered so far
(the first t sets in \mathcal S).
Let x^denote the (partially assigned) vector x' (so x^_s is determined only if s\in S^).
For each set s\not\in S^,
let p_s = \min(\lambda x^*_s, 1) denote the probability with which x'_s will be set to 1.
Let \mathcal U^contain the not-yet-covered elements.
Then the conditional expectation of F,
given the choices made so far, that is, given x^, is

: E x^~=~
\frac
~+~
\sum_\prod_ (1-p_s). Note that E x^/math> is determined only after iteration t .

Keeping the conditional probability of failure below 1

To keep the conditional probability of failure below 1, it suffices to keep the conditional expectation of

F

below 1. To do this, it suffices to keep the conditional expectation of

F

from increasing. This is what the algorithm will do. It will set

x'_s

in each iteration to ensure that ::

E x^\le E x^\le \cdots \le E x^\le E x^< 1

(where

m=, \mathcal S,

). In the

t

th iteration, how can the algorithm set

x'_

to ensure that

E x^\le E S^/math>?
It turns out that it can simply set x'_so as to ''minimize'' the resulting value of E x^/math>.

To see why, focus on the point in time when iteration t starts.
At that time, E x^/math> is determined,
but E x^/math> is not yet determined
--- it can take two possible values depending on how x'_is set in iteration t .
Let E^denote the value of E x'^/math>.
Let E^_0 and E^_1,
denote the two possible values of E x^/math>,
depending on whether x'_is set to 0, or 1, respectively.
By the definition of conditional expectation,
:: E^ ~=~ 
\Pr'_=0 E^_0
+
\Pr'_=1 E^_1. Since a weighted average of two quantities
is always at least the minimum of those two quantities,
it follows that
:: E^ ~\ge~ \min( E^_0, E^_1 ). Thus, setting x'_so as to minimize the resulting value of E x^/math>
will guarantee that E x^\le E x^/math>.
This is what the algorithm will do.

In detail, what does this mean?
Considered as a function of x'_(with all other quantities fixed) E x^/math>
is a linear function of x'_,
and the coefficient of x'_in that function is
: \frac
~-~
\sum_\prod_ (1-p_s). Thus, the algorithm should set x'_to 0 if this expression is positive,
and 1 otherwise.  This gives the following algorithm.

Randomized-rounding algorithm for set cover

input: set system

\mathcal S

, universe

\mathcal U

, cost vector

c

output: set cover

x'

(a solution to the standard integer linear program for set cover) ---- # Compute a min-cost fractional set cover

x^*

(an optimal solution to the LP relaxation). # Let

\lambda \leftarrow \ln(2, \mathcal U, )

. Let

p_s \leftarrow \min(\lambda x^*_,1)

for each

s\in\mathcal S

. # For each

s'\in\mathcal S

do: ## Let

\mathcal S \leftarrow \mathcal S - \

. (

\mathcal S

contains the not-yet-decided sets.) ## If

\frac
>
\sum_ \prod_(1-p_s)

##: then set

x'_s\leftarrow 0

, ##: else set

x'_s\leftarrow 1

and

\mathcal U\leftarrow\mathcal U - s'

. ##: (

\mathcal U

contains the not-yet-covered elements.) # Return

x'

. ----

lemma (approximation guarantee for algorithm)

:: ''The algorithm above returns a set cover

x'

of cost at most

2\ln(2, \mathcal U, )

times the minimum cost of any (fractional) set cover.''

proof

---- The algorithm ensures that the conditional expectation of

F

times the minimum cost of any (fractional) set cover.

Remarks

In the example above, the algorithm was guided by the conditional expectation of a random variable

F

. In some cases, instead of an exact conditional expectation, an ''upper bound'' (or sometimes a lower bound) on some conditional expectation is used instead. This is called a

pessimistic estimator In mathematics and computer science, the probabilistic method is used to prove the existence of mathematical objects with desired combinatorial properties. The proofs are probabilistic — they work by showing that a random object, chosen from som ...

References

* . * .