Within
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
and
operations research
Operations research ( en-GB, operational research) (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a discipline that deals with the development and application of analytical methods to improve dec ...
,
many
combinatorial optimization
Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combi ...
problems are computationally
intractable
Intractable may refer to:
* Intractable conflict, a form of complex, severe, and enduring conflict
* Intractable pain, pain which cannot be controlled/cured by any known treatment
* Intractable problem
In theoretical computer science and mathema ...
to solve exactly (to optimality).
Many such problems do admit fast (
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
)
approximation algorithms
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solut ...
—that is, algorithms that are guaranteed to return an approximately optimal solution given any input.
Randomized rounding
is a widely used approach for designing and analyzing such
approximation algorithms
In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solut ...
.
[
][
]
The basic idea is to use the
probabilistic method
to convert an optimal solution of a
relaxation
of the problem into an approximately optimal solution to the original problem.
Overview
The basic approach has three steps:
# Formulate the problem to be solved as an
integer linear program (ILP).
# Compute an optimal fractional solution
to the
linear programming relaxation (LP) of the ILP.
# Round the fractional solution
of the LP to an integer solution
of the ILP.
(Although the approach is most commonly applied with linear programs,
other kinds of relaxations are sometimes used.
For example, see Goemans' and Williamson's
semidefinite programming-based
Max-Cut approximation algorithm.)
The challenge in the first step is to choose a suitable integer linear program.
Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well,
such as in the Set Cover example below. (The integer linear program should have a small
integrality gap
In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.
For example, in a 0–1 integer program, all constraints are of the form
:x_i\in\.
The relax ...
;
indeed randomized rounding is often used to prove bounds on integrality gaps.)
In the second step, the optimal fractional solution can typically be computed
in
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
using any standard
linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
algorithm.
In the third step, the fractional solution must be converted into an integer solution
(and thus a solution to the original problem).
This is called ''rounding'' the fractional solution.
The resulting integer solution should (provably) have cost
not much larger than the cost of the fractional solution.
This will ensure that the cost of the integer solution
is not much larger than the cost of the optimal integer solution.
The main technique used to do the third step (rounding) is to use randomization,
and then to use probabilistic arguments to bound the increase in cost due to the rounding
(following the
probabilistic method from combinatorics).
Therein, probabilistic arguments are used to show the existence of discrete structures with
desired properties. In this context, one uses such arguments to show the following:
: ''Given any fractional solution
of the LP, with positive probability the randomized rounding process produces an integer solution
that approximates
'' according to some desired criterion.
Finally, to make the third step computationally efficient,
one either shows that
approximates
with high probability (so that the step can remain randomized)
or one
derandomizes the rounding step,
typically using the
method of conditional probabilities In mathematics and computer science, the probabilistic method is used to prove the existence of mathematical objects with desired combinatorial properties. The proofs are probabilistic — they work by showing that a random object, chosen from som ...
.
The latter method converts the randomized rounding process
into an efficient deterministic process that is guaranteed
to reach a good outcome.
Comparison to other applications of the probabilistic method
The randomized rounding step differs from most applications of the
probabilistic method in two respects:
# The
computational complexity
In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
of the rounding step is important. It should be implementable by a fast (e.g.
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
)
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
.
# The probability distribution underlying the random experiment is a function of the solution
of a
relaxation of the problem instance. This fact is crucial to proving the
performance guarantee of the approximation algorithm --- that is, that for any problem instance, the algorithm returns a solution that approximates the ''optimal solution for that specific instance''. In comparison,
applications of the probabilistic method in combinatorics typically show the existence of structures whose features depend on other parameters of the input. For example, consider
Turán's theorem, which can be stated as "any
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
with
vertices of average degree
must have an
independent set of size at least
. (See
this for a probabilistic proof of Turán's theorem.) While there are graphs for which this bound is tight, there are also graphs which have independent sets much larger than
. Thus, the size of the independent set shown to exist by Turán's theorem in a graph may, in general, be much smaller than the maximum independent set for that graph.
Set cover example
The following example illustrates how randomized rounding can be used to design an approximation algorithm for the
Set Cover
The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.
Given a set of elements (called the un ...
problem.
Fix any instance
of set cover over a universe
.
For step 1, let IP be the
standard integer linear program for set cover for this instance.
For step 2, let LP be the
linear programming relaxation of IP,
and compute an optimal solution
to LP
using any standard
linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
algorithm.
(This takes time polynomial in the input size.)
(The feasible solutions to LP are the vectors
that assign each set
a non-negative weight
,
such that, for each element
,
''covers''
-- the total weight assigned to the sets containing
is at least 1, that is,
::
The optimal solution
is a feasible solution whose cost
::
is as small as possible.)
----
Note that any set cover
for
gives a feasible solution
(where
for
,
otherwise).
The cost of this
equals the cost of
, that is,
::
In other words, the linear program LP is a
relaxation
of the given set-cover problem.
Since
has minimum cost among feasible solutions to the LP,
''the cost of
is a lower bound on the cost of the optimal set cover''.
Step 3: The randomized rounding step
Here is a description of the third step—the rounding step,
which must convert the minimum-cost fractional set cover
into a feasible integer solution
(corresponding to a true set cover).
The rounding step should produce an
that, with positive probability,
has cost within a small factor of the cost of
.
Then (since the cost of
is a lower bound on the cost of the optimal set cover),
the cost of
will be within a small factor of the optimal cost.
As a starting point, consider the most natural rounding scheme:
:: ''For each set
in turn, take
with probability
, otherwise take
.''
With this rounding scheme,
the expected cost of the chosen sets is at most
,
the cost of the fractional cover.
This is good. Unfortunately the coverage is not good.
When the variables
are small,
the probability that an element
is not covered is about
:
So only a constant fraction of the elements will be covered in expectation.
To make
cover every element with high probability,
the standard rounding scheme
first ''scales up'' the rounding probabilities
by an appropriate factor
.
Here is the standard rounding scheme:
:: ''Fix a parameter
. For each set
in turn,''
:: ''take
with probability
, otherwise take
.''
Scaling the probabilities up by
increases the expected cost by
,
but makes coverage of all elements likely.
The idea is to choose
as small
as possible so that all elements are provably
covered with non-zero probability.
Here is a detailed analysis.
----
Lemma (approximation guarantee for rounding scheme)
:: ''Fix
. With positive probability, the rounding scheme returns a set cover
of cost at most
(and thus of cost
times the cost of the optimal set cover).''
(Note: with care the
can be reduced to
.)
Proof
The output
of the random rounding scheme has the desired properties
as long as none of the following "bad" events occur:
# the cost
of
exceeds
, or
# for some element
,
fails to cover
.
The expectation of each
is at most
.
By
linearity of expectation,
the expectation of
is at most
.
Thus, by
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Marko ...
, the probability of the first bad event
above is at most
.
For the remaining bad events (one for each element
), note that,
since
for any given element
,
the probability that
is not covered is
:
(This uses the inequality
,
which is strict for
.)
Thus, for each of the
elements,
the probability that the element is not covered is less than
.
By the
union bound
In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individu ...
,
the probability that one of the
bad events happens
is less than
.
Thus, with positive probability there are no bad events
and
is a set cover of cost at most
.
QED
Derandomization using the method of conditional probabilities
The lemma above shows the ''existence'' of a set cover
of cost
).
In this context our goal is an efficient approximation algorithm,
not just an existence proof, so we are not done.
One approach would be to increase
a little bit, then show that the probability of success is at least, say, 1/4.
With this modification, repeating the random rounding step a few times
is enough to ensure a successful outcome with high probability.
That approach weakens the approximation ratio.
We next describe a different approach that yields
a deterministic algorithm that is guaranteed to
match the approximation ratio of the existence proof above.
The approach is called the
method of conditional probabilities In mathematics and computer science, the probabilistic method is used to prove the existence of mathematical objects with desired combinatorial properties. The proofs are probabilistic — they work by showing that a random object, chosen from som ...
.
The deterministic algorithm emulates the randomized rounding scheme:
it considers each set
in turn,
and chooses
.
But instead of making each choice ''randomly'' based on
,
it makes the choice ''deterministically'', so as to
''keep the conditional probability of failure, given the choices so far, below 1''.
Bounding the conditional probability of failure
We want to be able to set each variable
in turn
so as to keep the conditional probability of failure below 1.
To do this, we need a good bound on the conditional probability of failure.
The bound will come by refining the original existence proof.
That proof implicitly bounds the probability of failure
by the expectation of the random variable
:
,
where
:
is the set of elements left uncovered at the end.
The random variable
may appear a bit mysterious,
but it mirrors the probabilistic proof in a systematic way.
The first term in
comes from applying
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Marko ...
to bound the probability of the first bad event (the cost is too high).
It contributes at least 1 to
if the cost of
is too high.
The second term
counts the number of bad events of the second kind (uncovered elements).
It contributes at least 1 to
if
leaves any element uncovered.
Thus, in any outcome where
is less than 1,
must cover all the elements
and have cost meeting the desired bound from the lemma.
In short, if the rounding step fails, then
.
This implies (by
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Marko ...
) that
''