The set cover problem is a classical question in

_{} has value at least 1/ in the solution .

Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination

{{DEFAULTSORT:Set Cover Problem Set families NP-complete problems Linear programming Approximation algorithms Covering problems

combinatorics
Combinatorics is an area of mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geom ...

, computer science
Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application.
Computer science is the study of , , and . Computer science ...

, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems In computational complexity theory
Computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A ...

shown to be NP-complete
In computational complexity theory
Computational complexity theory focuses on classifying computational problem
In theoretical computer science
An artistic representation of a Turing machine. Turing machines are used to model general computi ...

in 1972.
It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms In computer science
Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application.
Computer science is the study of , , an ...

.
Given a set of elements $\backslash $ (called the universe
The universe ( la, universus) is all of space and time and their contents, including planets, stars, galaxy, galaxies, and all other forms of matter and energy. The Big Bang theory is the prevailing cosmological description of the development ...

) and a collection $S$ of $m$ sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of $S$ whose union equals the universe. For example, consider the universe $U\; =\; \backslash $ and the collection of sets $S\; =\; \backslash $. Clearly the union of $S$ is $U$. However, we can cover all of the elements with the following, smaller number of sets: $\backslash $.
More formally, given a universe $\backslash mathcal$ and a family $\backslash mathcal$ of subsets of $\backslash mathcal$,
a ''cover'' is a subfamily $\backslash mathcal\backslash subseteq\backslash mathcal$ of sets whose union is $\backslash mathcal$. In the set covering decision problem
In computability theory and computational complexity theory, a decision problem is a problem that can be posed as a yesâ€“no question of the input values. An example of a decision problem is deciding whether a given natural number is prime. Anot ...

, the input is a pair $(\backslash mathcal,\backslash mathcal)$ and an integer $k$; the question is whether
there is a set covering of size $k$ or less. In the set covering optimization problem
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities and t ...

, the input is a pair $(\backslash mathcal,\backslash mathcal)$, and the task is to find a set covering that uses the fewest sets.
The decision version of set covering is NP-complete
In computational complexity theory
Computational complexity theory focuses on classifying computational problem
In theoretical computer science
An artistic representation of a Turing machine. Turing machines are used to model general computi ...

, and the optimization/search version of set cover is NP-hard
In computational complexity theory
Computational complexity theory focuses on classifying computational problem
In theoretical computer science
An artistic representation of a Turing machine. Turing machines are used to model general computin ...

.
If each set is assigned a cost, it becomes a ''weighted'' set cover problem.
Integer linear program formulation

The minimum set cover problem can be formulated as the followinginteger linear program
An integer programming problem is a mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of ...

(ILP).
This ILP belongs to the more general class of ILPs for covering problem
In combinatorics and computer science, covering problems are computational problems that ask whether a certain combinatorial structure 'covers' another, or how large the structure has to be to do that. Covering problems are Optimization (mathematic ...

s.
The integrality gap of this ILP is at most $\backslash scriptstyle\; \backslash log\; n$, so its relaxation gives a factor-$\backslash scriptstyle\; \backslash log\; n$ approximation algorithm In computer science
Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application.
Computer science is the study of , , an ...

for the minimum set cover problem (where $\backslash scriptstyle\; n$ is the size of the universe).
In weighted set cover, the sets are assigned weights. Denote the weight of set $s\backslash in\; \backslash mathcal$ by $w\_$. Then the integer linear program describing weighted set cover is identical to the one given above, except that the objective function to minimize is $\backslash sum\_\; w\_s\; x\_s$.
Hitting set formulation

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrarybipartite graph
In the mathematical
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities ...

, with sets represented by vertices on the left, the universe represented by vertices on the
right, and edges representing the inclusion of elements in sets. The task is then to find a minimum cardinality subset of left-vertices which covers all of the right-vertices. In the Hitting set problem, the objective is to cover the left-vertices using a minimum subset of the right vertices. Converting from one problem to the other is therefore achieved by interchanging the two sets of vertices.
Greedy algorithm

There is agreedy algorithm
A greedy algorithm is any algorithm
In and , an algorithm () is a finite sequence of , computer-implementable instructions, typically to solve a class of problems or to perform a computation. Algorithms are always and are used as specific ...

for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a bucket queue
In the design and analysis of data structures, a bucket queue (also called a bucket priority queue. See also p. 157 for the history and naming of this structure. or bounded-height priority queue) is a priority queue for prioritizing elements w ...

to prioritize the sets. It achieves an approximation ratio of $H(s)$, where $s$ is the size of the set to be covered. In other words, it finds a covering that may be $H(n)$ times as large as the minimum one, where $H(n)$ is the $n$-th harmonic number
In mathematics
Mathematics (from Greek: ) includes the study of such topics as numbers (arithmetic and number theory), formulas and related structures (algebra), shapes and spaces in which they are contained (geometry), and quantities a ...

:
$$H(n)\; =\; \backslash sum\_^\; \backslash frac\; \backslash le\; \backslash ln\; +1$$
This greedy algorithm actually achieves an approximation ratio of $H(s^\backslash prime)$ where $s^\backslash prime$ is the maximum cardinality set of $S$. For $\backslash delta-$dense instances, however, there exists a $c\; \backslash ln$-approximation algorithm for every $c\; >\; 0$.
There is a standard example on which the greedy algorithm achieves an approximation ratio of $\backslash log\_2(n)/2$.
The universe consists of $n=2^-2$ elements. The set system consists of $k$ pairwise disjoint sets
$S\_1,\backslash ldots,S\_k$ with sizes $2,4,8,\backslash ldots,2^k$ respectively, as well as two additional disjoint sets $T\_0,T\_1$,
each of which contains half of the elements from each $S\_i$. On this input, the greedy algorithm takes the sets
$S\_k,\backslash ldots,S\_1$, in that order, while the optimal solution consists only of $T\_0$ and $T\_1$.
An example of such an input for $k=3$ is pictured on the right.
Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms
(see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly $\backslash ln\; -\; \backslash ln\; +\; \backslash Theta(1)$.
Low-frequency systems

If each element occurs in at most sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of using LP relaxation. If the constraint $x\_S\backslash in\backslash $ is replaced by $x\_S\; \backslash geq\; 0$ for all in $\backslash mathcal$ in the integer linear program shownabove
Above may refer to:
*Above (artist), Tavar Zawacki (born 1981), contemporary urban artist
*Above (magazine), ''Above'' (magazine), an American environmental magazine 2009â€“2010
*Above (Mad Season album), ''Above'' (Mad Season album), 1995
*Above ...

, then it becomes a (non-integer) linear program . The algorithm can be described as follows:
# Find an optimal solution for the program using some polynomial-time method of solving linear programs.
# Pick all sets for which the corresponding variable Inapproximability results

When $n$ refers to the size of the universe, showed that set covering cannot be approximated in polynomial time to within a factor of $\backslash tfrac\backslash log\_2\; \backslash approx\; 0.72\backslash ln$, unless NP hasquasi-polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...

algorithms. Feige Feige is a surname. Notable people with the surname include:
* Claude Feige (born 1958), French curler
* David Feige, American lawyer, legal commentator and author
* Eric Feige (born 1961), American politician
* Gerhard Feige (born 1951), bishop of ...

(1998) improved this lower bound to $\backslash bigl(1-o(1)\backslash bigr)\backslash cdot\backslash ln$ under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. established a lower bound
of $c\backslash cdot\backslash ln$, where $c$ is a certain constant, under the weaker assumption that P$\backslash not=$NP.
A similar result with a higher value of $c$ was recently proved by . showed optimal inapproximability by proving that it cannot be approximated to $\backslash bigl(1\; -\; o(1)\backslash bigr)\; \backslash cdot\; \backslash ln$ unless P$=$NP.
Weighted set cover

Relaxing
Leisure has often been defined as a quality of experience or as free time. Free time is time
Time is the continued sequence of existence and event (philosophy), events that occurs in an apparently irreversible process, irreversible s ...

the integer linear program for weighted set cover stated above
Above may refer to:
*Above (artist), Tavar Zawacki (born 1981), contemporary urban artist
*Above (magazine), ''Above'' (magazine), an American environmental magazine 2009â€“2010
*Above (Mad Season album), ''Above'' (Mad Season album), 1995
*Above ...

, one may use randomized rounding
Within computer science and operations research,
many combinatorial optimization problems are computationally intractability (complexity), intractable to solve exactly (to optimality).
Many such problems do admit fast (polynomial time) approximatio ...

to get an $O(\backslash log\; n)$-factor approximation. The corresponding analysis for nonweighted set cover is outlined in Randomized rounding#Randomized-rounding algorithm for set cover and can be adapted to the weighted case.
Related problems

* Hitting set is an equivalent reformulation of Set Cover. *Vertex cover
In graph theory, a vertex cover (sometimes node cover) of a Graph (discrete mathematics), graph is a set of Vertex (graph theory), vertices that includes at least one endpoint of every Edge (graph theory), edge of the graph (discrete mathematics) ...

is a special case of Hitting Set.
* Edge cover In graph theory, an edge cover of a Graph (discrete mathematics), graph is a set of edge (graph theory), edges such that every vertex (graph theory), vertex of the graph is incident to at least one edge of the set.
In computer science, the minimum e ...

is a special case of Set Cover.
* Geometric set cover is a special case of Set Cover when the universe is a set of points in $\backslash mathbb^d$ and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
* Set packing Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems.
Suppose one has a finite set ''S'' and a list of subsets of ''S''. Then, the set packing problem asks ...

* Maximum coverage problem is to choose at most k sets to cover as many elements as possible.
* Dominating set
In graph theory, a dominating set for a Graph (discrete mathematics), graph ''G'' = (''V'', ''E'') is a subset ''D'' of ''V'' such that every vertex not in ''D'' is adjacent to at least one member of ''D''. The domination number Î ...

is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
* Exact cover problem is to choose a set cover with no element included in more than one covering set.
* Red Blue Set Cover.
* Set-cover abduction.Notes

References

* . * * . * * . * . * . * * *External links

Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination

{{DEFAULTSORT:Set Cover Problem Set families NP-complete problems Linear programming Approximation algorithms Covering problems