HOME

TheInfoList



OR:

The subset sum problem (SSP) is a
decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whe ...
in
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
. In its most general formulation, there is a
multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
S of integers and a target-sum T, and the question is to decide whether any subset of the integers sum to precisely T''.'' The problem is known to be NP. Moreover, some restricted variants of it are
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
too, for example: * The variant in which all inputs are positive. * The variant in which inputs may be positive or negative, and T=0. For example, given the set \, the answer is ''yes'' because the subset \ sums to zero. * The variant in which all inputs are positive, and the target sum is exactly half the sum of all inputs, i.e., T = \frac(a_1+\dots+a_n) . This special case of SSP is known as the
partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbe ...
. SSP can also be regarded as an
optimization problem In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions. Optimization problems can be divided into two categories, depending on whether the variables ...
: find a subset whose sum is at most ''T'', and subject to that, as close as possible to ''T''. It is
NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
, but there are several algorithms that can solve it reasonably quickly in practice. SSP is a special case of the
knapsack problem The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit a ...
and of the
multiple subset sum The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset S of ''n'' integers and a positive integer ''m'' repres ...
problem.


Computational hardness

The
run-time complexity In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
of SSP depends on two parameters: * - the number of input integers. If ''n'' is a small fixed number, then an exhaustive search for the solution is practical. * - the precision of the problem, stated as the number of binary place values that it takes to state the problem. If ''L'' is a small fixed number, then there are
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
algorithms that can solve it exactly. As both ''n'' and ''L'' grow large, SSP is NP-hard. The complexity of the best known algorithms is
exponential Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value *Expo ...
in the smaller of the two parameters ''n'' and ''L''. The problem is NP-hard even when all input integers are positive (and the target-sum ''T'' is a part of the input). This can be proved by a direct reduction from
3SAT In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies ...
. It can also be proved by reduction from
3-dimensional matching In the mathematical discipline of graph theory, a 3-dimensional matching is a generalization of bipartite matching (also known as 2-dimensional matching) to 3-partite hypergraphs, which consist of hyperedges each of which contains 3 vertices (in ...
(3DM):, Section 3.1 and problem SP1 in Appendix A.3.1. * We are given an instance of 3DM, where the vertex sets are W, X, Y. Each set has ''n'' vertices. There are ''m'' edges, where each edge contains exactly one vertex from each of W, X, Y. Denote ''L'' := ceiling(log2(''m''+1)), so that ''L'' is larger than the number of bits required to represent the number of edges. * We construct an instance of SSP with ''m'' positive integers. The integers are described by their binary representation. Each input integer can be represented by 3''nL'' bits, divided into 3''n'' zones of ''L'' bits. Each zone corresponds to a vertex. * For each edge (w,x,y) in the 3DM instance, there is an integer in the SSP instance, in which exactly three bits are "1": the least-significant bits in the zones of the vertices w, x, and y. For example, if ''n''=10 and L=3, and W=(0,...,9), X=(10,...,19), Y=(20,...,29), then the edge (0, 10, 20) is represented by the number (20+230+260). * The target sum ''T'' in the SSP instance is set to an integer with "1" in the least-significant bit of every zone, that is, (20+21+...+23n-1). * If the 3DM instance has a perfect matching, then summing the corresponding integers in the SSP instance yields exactly T. * Conversely, if the SSP instance has a subset with sum exactly T, then, since the zones are sufficiently large so that there are no "carries" from one zone to the next, the sum must correspond to a perfect matching in the 3DM instance. The following variants are also known to be NP-hard: * The input integers can be both positive and negative, and the target-sum ''T'' = 0. This can be proved by reduction from the variant with positive integers. Denote that variant by SubsetSumPositive and the current variant by SubsetSumZero. Given an instance (''S'', ''T'') of SubsetSumPositive, construct an instance of SubsetSumZero by adding a single element with value −''T''. Given a solution to the SubsetSumPositive instance, adding the −''T'' yields a solution to the SubsetSumZero instance. Conversely, given a solution to the SubsetSumZero instance, it must contain the −''T'' (since all integers in S are positive), so to get a sum of zero, it must also contain a subset of S with a sum of +''T'', which is a solution of the SubsetSumPositive instance. *The input integers are positive, and ''T'' = sum(''S'')/2. This can also be proved by reduction from the general variant; see
partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbe ...
.


Exponential time algorithms

There are several ways to solve SSP in time exponential in ''n''.


Inclusion–exclusion

The most
naïve algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...
would be to cycle through all subsets of ''n'' numbers and, for every one of them, check if the subset sums to the right number. The running time is of order O(2^n \cdot n), since there are 2^n subsets and, to check each subset, we need to sum at most ''n'' elements. The algorithm can be implemented by
depth-first search Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible a ...
of a binary tree: each level in the tree corresponds to an input number; the left branch corresponds to excluding the number from the set, and the right branch corresponds to including the number (hence the name Inclusion-Exclusion). The memory required is O(n). The run-time can be improved by several heuristics: * Process the input numbers in descending order. * If the integers included in a given node exceed the sum of the best subset found so far, the node is pruned. * If the integers included in a given node, plus all remaining integers, are less than the sum of the best subset found so far, the node is pruned.


Horowitz and Sahni

In 1974, Horowitz and
Sahni Sahni (alternatively Sawhney, Sahney, Shahani , or Sahani ) is an Indian (Khatri) surname found among the Hindus or Sikhs of Punjab. Notable people with the surname include: * Ajai Sahni, author and expert on counter-terrorism * Ajay Prakash ...
published a faster exponential-time algorithm, which runs in time O( 2^\cdot (n/2)), but requires much more space - O( 2^). The algorithm splits arbitrarily the ''n'' elements into two sets of n/2 each. For each of these two sets, it stores a list of the sums of all 2^ possible subsets of its elements. Each of these two lists is then sorted. Using even the fastest comparison sorting algorithm, Mergesort for this step would take time O(2^n). However, given a sorted list of sums for k elements, the list can be expanded to two sorted lists with the introduction of a (k+1)th element, and these two sorted lists can be merged in time O(2^k). Thus, each list can be generated in sorted form in time O(2^). Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to ''T'' in time O(2^). To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than ''T'', the algorithm moves to the next element in the first array. If it is less than ''T'', the algorithm moves to the next element in the second array. If two elements that sum to ''T'' are found, it stops. (The sub-problem for two elements sum is known as two-sum.)


Schroeppel and Shamir

In 1981, Schroeppel and Shamir presented an algorithm based on Horowitz and Sanhi, that requires similar runtime - O( 2^\cdot (n/4)), much less space - O(2^). Rather than generating and storing all subsets of ''n''/2 elements in advance, they partition the elements into 4 sets of ''n''/4 elements each, and generate subsets of ''n''/2 element pairs dynamically using a min heap, which yields the above time and space complexities since this can be done in O(k^\log(k)) and space O(k) given 4 lists of length k. Due to space requirements, the HS algorithm is practical for up to about 50 integers, and the SS algorithm is practical for up to 100 integers.


Howgrave-Graham and Joux

Howgrave-Graham and Joux presented a probabilistic algorithm that runs faster than all previous ones - in time O(2^) using space O(2^). It solves only the decision problem, cannot prove there is no solution for a given sum, and does not return the subset sum closest to ''T''. The techniques of Howgrave-Graham and Joux were subsequently extended bringing the time-complexity to O(2^).


Pseudo-polynomial time dynamic programming solutions

SSP can be solved in pseudo-polynomial time using
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
. Suppose we have the following sequence of elements in an instance: :x_1,\ldots, x_N We define a ''state'' as a pair (''i'', ''s'') of integers. This state represents the fact that :"there is a nonempty subset of x_1,\ldots, x_i which sums to ." Each state (''i'', ''s'') has two next states: * (''i''+1, ''s''), implying that x_ is not included in the subset; * (''i''+1, ''s''+x_), implying that x_ is included in the subset. Starting from the initial state (0, 0), it is possible to use any graph search algorithm (e.g. BFS) to search the state (''N'', ''T''). If the state is found, then by backtracking we can find a subset with a sum of exactly ''T''. The run-time of this algorithm is at most linear in the number of states. The number of states is at most ''N'' times the number of different possible sums. Let be the sum of the negative values and the sum of the positive values; the number of different possible sums is at most ''B''-''A'', so the total runtime is in O(N(B-A)). For example, if all input values are positive and bounded by some constant ''C'', then ''B'' is at most ''N C'', so the time required is O(N^C). This solution does not count as polynomial time in complexity theory because B-A is not polynomial in the ''size'' of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of and , which are exponential in their numbers of bits. However, Subset Sum encoded in ''unary'' is in P, since then the size of the encoding is linear in B-A. Hence, Subset Sum is only ''weakly'' NP-Complete. For the case that each x_i is positive and bounded by a fixed constant , Pisinger found a linear time algorithm having time complexity O(NC) (note that this is for the version of the problem where the target sum is not necessarily zero, otherwise the problem would be trivial). In 2015, Koiliaris and Xu found a deterministic \tilde(T \sqrt N) algorithm for the subset sum problem where is the sum we need to find. In 2017, Bringmann found a randomized \tilde(T+N) time algorithm. In 2014, Curtis and Sanches found a simple recursion highly scalable in
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
machines having O(N(m-x_)/p) time and O(N+m-x_) space, where is the number of processing elements, m=\min(s, \sum x_i - s) and x_ is the lowest integer. This is the best theoretical parallel complexity known so far. A comparison of practical results and the solution of hard instances of the SSP is discussed by Curtis and Sanches.


Polynomial time approximation algorithms

Suppose all inputs are positive. An
approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...
to SSP aims to find a subset of ''S'' with a sum of at most ''T'' and at least ''r'' times the optimal sum, where ''r'' is a number in (0,1) called the ''approximation ratio''.


Simple 1/2-approximation

The following very simple algorithm has an approximation ratio of 1/2: * Order the inputs by descending value; * Put the next-largest input into the subset, as long as it fits there. When this algorithm terminates, either all inputs are in the subset (which is obviously optimal), or there is an input that does not fit. The first such input is smaller than all previous inputs that are in the subset and the sum of inputs in the subset is more than ''T''/2 otherwise the input also is less than T/2 and it would fit in the set. Such a sum greater than T/2 is obviously more than OPT/2.


Fully-polynomial time approximation scheme

The following algorithm attains, for every \epsilon>0, an approximation ratio of (1-\epsilon). Its run time is polynomial in and 1/\epsilon. Recall that ''n'' is the number of inputs and ''T'' is the upper bound to the subset sum. initialize a list ''L'' to contain one element 0. for each ''i'' from 1 to ''n'' do let ''Ui'' be a list containing all elements ''y'' in ''L'', and all sums ''xi'' + ''y'' for all ''y'' in ''L''. sort ''Ui'' in ascending order make ''L'' empty let ''y'' be the smallest element of ''Ui'' add ''y'' to ''L'' for each element ''z'' of ''Ui'' in increasing order do // Trim the list by eliminating numbers close to one another // and throw out elements greater than the target sum ''T''. if ''y'' + ''ε T''/''n'' < ''z'' ≤ ''T'' then ''y'' = ''z'' add ''z'' to ''L'' return the largest element in ''L.'' Note that without the trimming step (the inner "for each" loop), the list ''L'' would contain the sums of all 2^n subsets of inputs. The trimming step does two things: * It ensures that all sums remaining in ''L'' are below ''T'', so they are feasible solutions to the subset-sum problem. * It ensures that the list L is "sparse", that is, the difference between each two consecutive partial-sums is at least \epsilon T/n. These properties together guarantee that the list contains no more than n/\epsilon elements; therefore the run-time is polynomial in n/\epsilon. When the algorithm ends, if the optimal sum is in , then it is returned and we are done. Otherwise, it must have been removed in a previous trimming step. Each trimming step introduces an additive error of at most \epsilon T/n, so steps together introduce an error of at most \epsilon T. Therefore, the returned solution is at least \text-\epsilon T which is at least (1-\epsilon)\text . The above algorithm provides an ''exact'' solution to SSP in the case that the input numbers are small (and non-negative). If any sum of the numbers can be specified with at most bits, then solving the problem approximately with \epsilon = 2^ is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in and 2^P (i.e., exponential in ). Kellerer, Mansini, Pferschy and Speranza and Kellerer, Pferschy and Pisinger present other FPTAS-s for subset sum.


See also

*
Knapsack problem The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit a ...
- a generalization of SSP in which each input item has both a value and a weight. The goal is to maximize the ''value'' such that the total ''weight'' is bounded. * Multiple subset sum problem - a generalization off SSP in which one should choose several subsets. * 3SUM * Merkle–Hellman knapsack cryptosystem


References


Further reading

* * A3.2: SP13, pg.223. * * {{DEFAULTSORT:Subset Sum Problem Weakly NP-complete problems Dynamic programming Articles with example pseudocode