K-center
   HOME

TheInfoList



OR:

In
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
, the metric -center problem or vertex k-center problem is a classical
combinatorial optimization Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combina ...
problem studied in
theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...
that is
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
. Given cities with specified distances, one wants to build warehouses in different cities and minimize the maximum distance of a city to a warehouse. In
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
, this means finding a set of vertices for which the largest distance of any point to its closest vertex in the -set is minimum. The vertices must be in a
metric space In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
, providing a
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices i ...
that satisfies the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...
. It has application in
facility location Facility location is a name given to several different problems in computer science and in game theory: * Optimal facility location, the optimal placement of facilities as a function of transportation costs and other factors * Facility location (com ...
and clustering.


Formal definition

The problem was first proposed by Hakimi in 1964. Let (X,d) be a
metric space In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
where X is a set and d is a
metric Metric or metrical may refer to: Measuring * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics ...

A set \mathbf\subseteq\mathcal, is provided together with a parameter k. The goal is to find a subset \mathcal\subseteq \mathbf with , \mathcal, =k such that the maximum distance of a point in \mathbf to the closest point in \mathcal is minimized. The problem can be formally defined as follows:
For a metric space (\mathcal,d), * ''Input'': a set \mathbf\subseteq\mathcal, and a parameter k. * ''Output'': a set \mathcal\subseteq\mathbf of k points. * ''Goal'': Minimize the cost r^\mathcal(\mathbf) = \underset d(v,\mathcal) That is, every point in a cluster is in distance at most r^\mathcal(V) from its respective center. The k-Center Clustering problem can also be defined on a complete undirected graph ''G'' = (''V'', ''E'') as follows:
Given a complete undirected graph ''G'' = (''V'', ''E'') with distances ''d''(''v''''i'', ''v''''j'') ∈ ''N'' satisfying the triangle inequality, find a subset ''C'' ⊆ ''V'' with , ''C'',  = ''k'' while minimizing: : \max_ \min_ d(v,c)


Computational complexity

In a complete undirected graph ''G'' = (''V'', ''E''), if we sort the edges in non-decreasing order of the distances: ''d''(''e''1) ≤ ''d''(''e''2) ≤ ... ≤ ''d''(''e''m) and let ''G''i = (V, ''E''i), where ''E''i = . The ''k''-center problem is equivalent to finding the smallest index ''i'' such that ''G''i has a
dominating set In graph theory, a dominating set for a Graph (discrete mathematics), graph is a subset of its vertices, such that any vertex of is in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for ...
of size at most ''k''. Although Dominating Set is
NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...
, the ''k''-center problem remains
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
. This is clear, since the optimality of a given feasible solution for the ''k''-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index ''i'' such that ''G''i has a
dominating set In graph theory, a dominating set for a Graph (discrete mathematics), graph is a subset of its vertices, such that any vertex of is in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for ...
of size at most ''k''), which is precisely the difficult core of the
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
problems. Although a
Turing reduction In computability theory, a Turing reduction from a decision problem A to a decision problem B is an oracle machine that decides problem A given an oracle for B (Rogers 1967, Soare 1987) in finitely many steps. It can be understood as an algorithm ...
can get around this issue by trying all values of ''k''.


Approximations


A simple greedy algorithm

A simple greedy
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
that achieves an approximation factor of 2 builds \mathcal using a farthest-first traversal in ''k'' iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows: * Pick an arbitrary point \bar_1 into C_1 * For every point v\in \mathbf compute d_1 /math> from \bar_1 * Pick the point \bar_2 with highest distance from \bar_1. * Add it to the set of centers and denote this expanded set of centers as C_2. Continue this till ''k'' centers are found


Running time

* The ith iteration of choosing the ith center takes \mathcal(n) time. * There are ''k'' such iterations. * Thus, overall the algorithm takes \mathcal(nk) time.


Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor. Given a set of ''n'' points \mathbf\subseteq\mathcal, belonging to a metric space (\mathcal,d), the greedy ''K''-center algorithm computes a set K of ''k'' centers, such that K is a 2-approximation to the optimal ''k''-center clustering of V. ''i.e.'' r^(\mathbf)\leq 2r^(\mathbf,\textit) This theorem can be proven using two cases as follows, Case 1: Every cluster of \mathcal_ contains exactly one point of \mathbf * Consider a point v\in \mathbf * Let \bar be the center it belongs to in \mathcal_ * Let \bar be the center of \mathbf that is in \Pi(\mathcal_,\bar) * d(v,\bar)=d(v,\mathcal_)\leq r^(\mathbf,k) * Similarly, d(\bar,\bar)=d(\bar,\mathcal_)\leq r^ * By the triangle inequality: d(v,\bar)\leq d(v,\bar)+d(\bar,\bar)\leq 2r^
Case 2: There are two centers \bar and \bar of \mathbf that are both in \Pi(\mathcal_,\bar), for some \bar\in \mathcal_ (By pigeon hole principle, this is the only other possibility) * Assume, without loss of generality, that \bar was added later to the center set \mathbf by the greedy algorithm, say in ith iteration. * But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that \bar\in\mathcal_and, \begin r^\mathbf(\mathbf)\leq r^(\mathbf)&=d(\bar,\mathcal_)\\ &\leq d(\bar,\bar)\\ &\leq d(\bar,\bar)+d(\bar,\bar)\\ &\leq 2r^ \end


Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the ''k''-Center problem is equivalent to finding the smallest index ''i'' such that ''G''i has a dominating set of size at most ''k'' and computes a maximal independent set of ''G''i, looking for the smallest index ''i'' that has a maximal independent set with a size of at least ''k''. It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. Furthermore, the distances of all edges in G must satisfy the triangle inequality if the ''k''-center problem is to be approximated within any constant factor, unless P = NP.


Parameterized approximations

It can be shown that the ''k''-Center problem is W hard to approximate within a factor of 2 − ε for any ε > 0, when using ''k'' as the parameter. This is also true when parameterizing by the doubling dimension (in fact the dimension of a
Manhattan metric Taxicab geometry or Manhattan geometry is geometry where the familiar Euclidean distance is ignored, and the distance between two point (geometry), points is instead defined to be the sum of the absolute differences of their respective Cartesian ...
), unless
P=NP The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved. Here, "quickly" means an algorithm exists that ...
. When considering the combined parameter given by ''k'' and the doubling dimension, ''k''-Center is still W hard but it is possible to obtain a parameterized approximation scheme. This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.


Approximation algorithms

If P \neq NP, the vertex ''k''-center problem can not be (optimally) solved in polynomial time. However, there are some polynomial time
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
s that get near-optimal solutions. Specifically, 2-approximated solutions. Actually, if P \neq NP the best possible solution that can be achieved by a polynomial time algorithm is a 2-approximated one. In the context of a minimization problem, such as the vertex ''k''-center problem, a 2-approximated solution is any solution C' such that r(C') \le 2 \times r(\text), where r(\text) is the size of an optimal solution. An algorithm that guarantees to generate 2-approximated solutions is known as a 2-approximation algorithm. The main 2-approximated algorithms for the vertex ''k''-center problem reported in the literature are the Sh algorithm, the HS algorithm, and the Gon algorithm. Even though these algorithms are the (polynomial) best possible ones, their performance on most benchmark datasets is very deficient. Because of this, many
heuristics A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...
and
metaheuristic In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an op ...
s have been developed through the time. Contrary to common sense, one of the most practical (polynomial) heuristics for the vertex ''k''-center problem is based on the CDS algorithm, which is a 3-approximation algorithm


The Sh algorithm

Formally characterized by David Shmoys in 1995, the Sh algorithm takes as input a complete undirected graph G=(V,E), a positive integer k, and an assumption r on what the optimal solution size is. The Sh algorithm works as follows: selects the first center c_1 at random. So far, the solution consists of only one vertex, C=\. Next, selects center c_2 at random from the set containing all the vertices whose distance from C is greater than 2 \times r. At this point, C=\. Finally, selects the remaining k-2 centers the same way c_2 was selected. The complexity of the Sh algorithm is O(kn), where n is the number of vertices.


The HS algorithm

Proposed by Dorit Hochbaum and David Shmoys in 1985, the HS algorithm takes the Sh algorithm as basis. By noticing that the value of r(\text) must equals the cost of some edge in E, and since there are O(n^2) edges in E, the HS algorithm basically repeats the Sh algorithm with every edge cost. The complexity of the HS algorithm is O(n^4). However, by running a
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
over the ordered set of edge costs, its complexity is reduced to O(n^2 \log n).


The Gon algorithm

Proposed independently by Teofilo Gonzalez, and by Martin Dyer and Alan Frieze in 1985, the Gon algorithm is basically a more powerful version of the Sh algorithm. While the Sh algorithm requires a guess r on r(\text), the Gon algorithm prescinds from such guess by noticing that if any set of vertices at distance greater than 2 \times r(\text) exists, then the farthest vertex must be inside such set. Therefore, instead of computing at each iteration the set of vertices at distance greater than 2 \times r and then selecting a random vertex, the Gon algorithm simply selects the farthest vertex from every partial solution C'. The complexity of the Gon algorithm is O(kn), where n is the number of vertices.


The CDS algorithm

Proposed by García Díaz et al. in 2017, the CDS algorithm is a 3-approximation algorithm that takes ideas from the Gon algorithm (farthest point heuristic), the HS algorithm (parametric pruning), and the relationship between the vertex ''k''-center problem and the
Dominating Set In graph theory, a dominating set for a Graph (discrete mathematics), graph is a subset of its vertices, such that any vertex of is in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for ...
problem. The CDS algorithm has a complexity of O(n^4). However, by performing a binary search over the ordered set of edge costs, a more efficiente heuristic named CDSh is proposed. The CDSh algorithm complexity is O(n^2 \log n). Despite the suboptimal performance of the CDS algorithm, and the heuristic performance of CDSh, both present a much better performance than the Sh, HS, and Gon algorithms.


Parameterized approximations

It can be shown that the ''k''-Center problem is W hard to approximate within a factor of 2 − ε for any ε > 0, when using ''k'' as the parameter. This is also true when parameterizing by the doubling dimension (in fact the dimension of a
Manhattan metric Taxicab geometry or Manhattan geometry is geometry where the familiar Euclidean distance is ignored, and the distance between two point (geometry), points is instead defined to be the sum of the absolute differences of their respective Cartesian ...
), unless
P=NP The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved. Here, "quickly" means an algorithm exists that ...
. When considering the combined parameter given by ''k'' and the doubling dimension, ''k''-Center is still W hard but it is possible to obtain a parameterized approximation scheme. This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.


Experimental comparison

Some of the most widely used benchmark datasets for the vertex ''k''-center problem are the pmed instances from OR-Lib., and some instances from TSP-Lib. Table 1 shows the mean and standard deviation of the experimental approximation factors of the solutions generated by each algorithm over the 40 pmed instances from OR-Lib


Polynomial heuristics


Greedy pure algorithm

The greedy pure algorithm (or Gr) follows the core idea of
greedy algorithm A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
s: to take optimal local decisions. In the case of the vertex ''k''-center problem, the optimal local decision consists in selecting each center in such a way that the size of the solution (covering radius) is minimum at each iteration. In other words, the first center selected is the one that solves the 1-center problem. The second center selected is the one that, along with the previous center, generates a solution with minimum covering radius. The remaining centers are selected the same way. The complexity of the Gr algorithm is O(kn^2). The empirical performance of the Gr algorithm is poor on most benchmark instances.


Scoring algorithm

The Scoring algorithm (or Scr) was introduced by Jurij Mihelič and Borut Robič in 2005. This algorithm takes advantage of the reduction from the vertex ''k''-center problem to the minimum dominating set problem. The problem is solved by pruning the input graph with every possible value of the optimal solution size and then solving the minimum dominating set problem heuristically. This heuristic follows the ''lazy principle,'' which takes every decision as slow as possible (opossed to the greedy strategy). The complexity of the Scr algorithm is O(n^4). The empirical performance of the Scr algorithm is very good on most benchmark instances. However, its running time rapidly becomes impractical as the input grows. So, it seems to be a good algorithm only for small instances.


See also

*
Traveling salesman problem In the theory of computational complexity, the travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exac ...
* Minimum k-cut *
Dominating set In graph theory, a dominating set for a Graph (discrete mathematics), graph is a subset of its vertices, such that any vertex of is in , or has a neighbor in . The domination number is the number of vertices in a smallest dominating set for ...
*
Independent set (graph theory) In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set S of vertices such that for every two vertices in S, there is no edge connecting the t ...
* Facility location problem


References


Further reading

* {{DISPLAYTITLE:Metric ''k''-center Combinatorial optimization Computational problems in graph theory Approximation algorithms NP-hard problems