computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, the cell-probe model is a model of computation similar to the

random-access machine In computer science, random-access machine (RAM or RA-machine) is a model of computation that describes an abstract machine in the general class of register machines. The RA-machine is very similar to the counter machine but with the added capab ...

, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

Overview

The cell-probe model is a modification of the

model, in which computational cost is only assigned to accessing memory cells. The model is intended for proving lower bounds on the complexity of

data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...

problems. One type of such problems has two phases: the preprocessing phase and the query phase. The input to the first phase, the preprocessing phase, is a set of data from which to build some structure from memory cells. The input to the second phase, the query phase, is a query parameter. The query has to consult the data structure in order to compute its result; for example, a query may be asked to determine if the query parameter was included in the original input data set. Another type of problem involves both update operations, that modify the data structure, and query operations. For example, an update may add an element to the structure, or remove one. In both cases, the cell-probe complexity of the data structure is characterized by the number of memory cells accessed during preprocessing, query and (if relevant) update. The cell probe complexity is a lower bound on the time complexity of the corresponding operations on a random-access machine, where memory transfers are part of the operations counted in measuring time. An example of such a problem is the dynamic

partial sum In mathematics, a series is, roughly speaking, an addition of infinitely many terms, one after the other. The study of series is a major part of calculus and its generalization, mathematical analysis. Series are used in most areas of mathemati ...

problem.

History

Andrew Yao Andrew Chi-Chih Yao ( zh , c = 姚期智 , p = Yáo Qīzhì; born December 24, 1946) is a Chinese computer scientist, physicist, and computational theorist. He is currently a professor and the dean of Institute for Interdisciplinary Informati ...

's 1981 paper "Should Tables Be Sorted?" is considered as the introduction of the cell-probe model. Yao used it to give a minimum number of memory cell "probes" or accesses necessary to determine whether a given query datum exists within a table stored in memory. In 1989, Fredman and Saks initiated the study of cell probe lower bounds for dynamic data-structure problems (i.e., involving updates and queries), and introduced the notation CPROBE(''b'') for the cell-probe model assuming that a memory cell (word) consists of ''b'' bits.

Notable results

Searching Tables

Yao considered a static data-structure problem where one has to build a data structure ("table") to represent a set

S

n

elements out of

1,\dots,m

. The query parameter is a number

x\le m

and the query has to report whether

x

is in the table. A crucial requirement is that the table consist of exactly

n

entries, where each entry is an integer between

1

and

m

. Yao showed that as long as the table size is bounded independently of

m

and

m

is large enough, a query must perform

\lceil\log (n+1)\rceil

probes in the worst case. This shows that a sorted table together with binary search for queries is an optimal scheme, in this restricted setting. It is worth noting that in the same paper, Yao also showed, that if the problem is relaxed to allow the data structure to store

n + 1

entries, then the queries can be performed using only two probes. See Theorem 3 of Yao's paper. This upper bound, similarly to the lower bound described above, also requires

m

to be sufficiently large, as a function of

n

. Remarkably, this upper bound uses only one additional table entry than the setting for which the lower bound applies.

Dynamic Partial Sums

The dynamic partial sum problem defines two operations which sets the value in an array at index to be , and which returns the sum of the values in at indices through . A naïve implementation would take

O(1)

time for and

O(n)

time for . Instead, values can be stored as leaves in a tree whose inner nodes store the sum over the subtree rooted at that node. In this structure requires

O(\log n)

time to update each node in the leaf to root path, and similarly requires

O(\log n)

time to traverse the tree from leaf to root summing the values of all subtrees left of the query index. Improving on a result of Fredman and Saks, Mihai Pătraşcu used the cell-probe model and an information transfer argument to show that the partial sums problem requires

\Omega\left(\log n\right)

time per operation in the worst case (i.e., the worst of query and update must consume such time), assuming

b=\Omega(\log n)

bits per word. He further exhibited the trade-off curve between update time and query time and investigated the case that updates are restricted to small numbers (of

\delta=o(b)

bits).

Disjoint Set Maintenance (Union-Find)

In the

disjoint-set data structure In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of Disjoint sets, disjoint (non-overlapping) Set (mathematics), sets. Equivalently, it ...

, the structure represents a collection of disjoint sets; there is an update operation, called Union, which unites two sets, and a query operation, called Find, which identifies the set to which a given element belongs. Fredman and Saks proved that in the model CPROBE(log ''n''), any solution for this problem requires

\Omega(m\alpha(m,n))

probes in the worst case (even in expectation) to execute

n-1

unions and

m\ge n

finds. This shows that the classic data structure described in the article on

is optimal.

Approximate Nearest Neighbor Searching

The exact

nearest neighbor search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...

problem is to determine the closest in a set of input points to a given query point. An approximate version of this problem is often considered since many applications of this problem are in very high dimension spaces and solving the problem in high dimensions requires exponential time or space with respect to the dimension. Chakrabarti and Regev proved that the approximate nearest neighbor search problem on the Hamming cube using polynomial storage and

d^

word size requires a worst-case query time of

\Omega\left(\frac\right)

. This proof used the cell-probe model and information theoretic techniques from

communication complexity In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first intro ...

The Cell-Probe Model versus Random Access Machines

In the cell probe model, limiting the range of values that can be stored in a cell is paramount (otherwise one could encode the whole data structure in one cell). The idealized random access machine used as a computational model in Computer Science does not impose a limit on the contents of each cell (in contrast to the

word RAM In theoretical computer science, the word RAM (word random-access machine) model is a model of computation in which a random-access machine does arithmetic and bitwise operations on a word of bits. Michael Fredman and Dan Willard created it in 1990 ...

). Thus cell probe lower bounds apply to the word RAM, but do not apply to the idealized RAM. Certain techniques for cell-probe lower bounds can, however, be carried over to the idealized RAM with an algebraic instruction set and similar lower bounds result.

External links

NIST's Dictionary of Algorithms and Data Structures entry on the cell-probe model

References

Notes

Citations

{{reflist Register machines Models of computation