computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and explores the relationships between these classifications. A computational problem ...

, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct. It is a well studied problem in many different models of computation. The problem may be solved by

sorting Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. # ordering: arranging items in a sequence ordered by some criterion; # categorizing: grouping items with similar p ...

the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a

randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...

that inserts each item into a

hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...

and compares only those elements that are placed in the same hash table cell. Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.

Decision tree complexity

The number of comparisons needed to solve the problem of size

n

, in a comparison-based model of computation such as a

decision tree A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...

or algebraic decision tree, is

\Theta(n\log n)

. Here,

\Theta

invokes big theta notation, meaning that the problem can be solved in a number of comparisons proportional to

n\log n

(a linearithmic function) and that all solutions require this many comparisons. In these models of computation, the input numbers may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. For these models, an algorithm based on

comparison sort A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator or a three-way comparison) that determines which of two elements should oc ...

solves the problem within a constant factor of the best possible number of comparisons. The same lower bound applies as well to the expected number of comparisons in the randomized algebraic decision tree model.

Real RAM Complexity

If the elements in the problem are

real numbers In mathematics, a real number is a number that can be used to measurement, measure a continuous variable, continuous one-dimensional quantity such as a time, duration or temperature. Here, ''continuous'' means that pairs of values can have arbi ...

, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). It follows that the problem's complexity in this model is also

\Theta(n\log n)

. This RAM model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions.

Turing Machine complexity

A single-tape deterministic

Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...

can solve the problem, for ''n'' elements of bits each, in time , while on a nondeterministic machine the time complexity is .

Quantum complexity

Quantum algorithm In quantum computing, a quantum algorithm is an algorithm that runs on a realistic model of quantum computation, the most commonly used model being the quantum circuit model of computation. A classical (or non-quantum) algorithm is a finite seq ...

s can solve this problem faster, in

\Theta(n^)

queries. The optimal algorithm is by Andris Ambainis. Yaoyun Shi first proved a tight lower bound when the size of the range is sufficiently large. Ambainis and Kutin independently (and via different proofs) extended his work to obtain the lower bound for all functions.

Generalization: Finding repeated elements

Elements that occur more than

n/k

times in a multiset of size

n

may be found by a comparison-based algorithm, the Misra–Gries heavy hitters algorithm, in time

O(n\log k)

. The element distinctness problem is a special case of this problem where

k=n

. This time is optimal under the decision tree model of computation..

References