HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, and more precisely regarding
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, a persistent array is a persistent data structure with properties similar to a (non-persistent) array. That is, after a value's update in a persistent array, there exist two persistent arrays: one persistent array in which the update is taken into account, and one which is equal to the array before the update.


Difference between persistent arrays and arrays

An array \mathrm= _0,\dots,e_/math> is a data structure, with a fixed number ''n'' of elements e_0, \dots, e_. It is expected that, given the array ''ar'' and an index 0\le i, the value e_i can be retrieved quickly. This operation is called a lookup. Furthermore, given the array ''ar'', an index 0\le i and a new value ''v'', a new array ''ar2'' with content _0,\dots,e_,v,e_,\dots,e_/math> can be created quickly. This operation is called an update. The main difference between persistent and non-persistent arrays being that, in non-persistent arrays, the array ''ar'' is destroyed during the creation of ''ar2''. For example, consider the following pseudocode. ''array'' =
, 0, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
''updated_array'' = ''array''.update(0, 8) ''other_array'' = ''array''.update(1, 3)'' ''last_array'' = ''updated_array''.update(2, 5) At the end of execution, the value of ''array'' is still
, 0, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
the value of ''updated_array'' is
, 0, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
the value of ''other_array'' is , 3, 0 and the value of ''last_array'' is , 0, 5 There exist two kinds of persistent arrays. A persistent array may be either partially or fully persistent. A fully persistent array may be updated an arbitrary number of times while a partially persistent array may be updated at most once. In our previous example, if ''array'' were only partially persistent, the creation of ''other_array'' would be forbidden; however, the creation of ''last_array'' would still be valid. Indeed, ''updated_array'' is an array distinct from ''array'' and has never been updated before the creation of ''last_array''.


Lower Bound on Persistent Array Lookup Time

Given that non-persistent arrays support both updates and lookups in constant time, it is natural to ask whether the same is possible with persistent arrays. The following theorem shows that under mild assumptions about the space complexity of the array, lookups must take \Omega(\log \log n) time in the worst case, regardless of update time, in the
cell-probe model In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure probl ...
.


Implementations

In this section, n is the number of elements of the array, and m is the number of updates.


Worst case log-time

The most straightforward implementation of a fully persistent array uses an arbitrary persistent map, whose keys are the numbers from ''0'' to ''n'' − 1. A persistent map may be implemented using a persistent
balanced tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
, in which case both updates and lookups would take O(\log n) time. This implementation is optimal for the pointer machine model.


Shallow binding

A fully persistent array may be implemented using an array and the so-called Baker's trick. This implementation is used in the
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
module parray.ml by Jean-Christophe Filliâtre. In order to define this implementation, a few other definitions must be given. An initial array is an array that is not generated by an update on another array. A child of an array ''ar'' is an array of the form ''ar.update(i,v)'', and ''ar'' is the parent of ''ar.update(i,v)''. A descendant of an array ''ar'' is either ''ar'' or the descendant of a child of ''ar''. The initial array of an array ''ar'' is either ''ar'' if ''ar'' is initial, or it is the initial array of the parent of ''ar''. That is, the initial array of ''ar'' is the unique array ''init'' such that \mathrm = init.update(i_0,v_0).\dots.update(i_m,v_m), with ''ar'' initial and i_0,\dots,i_m an arbitrary sequence of indexes and v_0,\dots,v_m an arbitrary sequence of value. A ''family'' of arrays is thus a set of arrays containing an initial array and all of its descendants. Finally, the tree of a family of arrays is the
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
whose nodes are the arrays, and with an edge ''e'' from ''ar'' to each of its children ''ar.update(i,v)''. A persistent array using Baker's trick consists of a pair with an actual array called ''array'' and the tree of arrays. This tree admits an arbitrary root - not necessarily the initial array. The root may be moved to an arbitrary node of the tree. Changing the root from ''root'' to an arbitrary node ''ar'' takes time proportional to the depth of ''ar''. That is, in the distance between ''root'' and ''ar''. Similarly, looking up a value takes time proportional to the distance between the array and the root of its family. Thus, if the same array ''ar'' may be lookup multiple times, it is more efficient to move the root to ''ar'' before doing the lookup. Finally updating an array only takes constant time. Technically, given two adjacent arrays ''ar1'' and ''ar2'', with ''ar1'' closer to the root than ''ar2'', the edge from ''ar1'' to ''ar2'' is labelled by ''(i,ar2 '', where ''i'' the only position whose value differ between ''ar1'' and ''ar2''. Accessing an element ''i'' of an array ''ar'' is done as follows. If ''ar'' is the root, then ''ar ' equals ''root '. Otherwise, let ''e'' the edge leaving ''ar'' toward the root. If the label of ''e'' is ''(i,v)'' then ''ar ' equals ''v''. Otherwise, let ''ar2'' be the other node of the edge ''e''. Then ''ar ' equals ''ar2 '. The computation of ''ar2 ' is done recursively using the same definition. The creation of ''ar.update(i,v)'' consists in adding a new node ''ar2'' to the tree, and an edge ''e'' from ''ar'' to ''ar2'' labelled by ''(i,v)''. Finally, moving the root to a node ''ar'' is done as follows. If ''ar'' is already the root, there is nothing to do. Otherwise, let ''e'' the edge leaving ''ar'' toward the current root, ''(i,v)'' its label and ''ar2'' the other end of ''e''. Moving the root to ''ar'' is done by first moving the root to ''ar2'', changing the label of ''e'' to ''(i, ar2 '', and changing ''array ' to ''v''. Updates take O(1) time. Lookups take O(1) time if the root is the array being looked up, but \Theta(m) time in the worst case.


Expected amortized log-log-time

In 1989, Dietz gave an implementation of fully persistent arrays using O(m+n) space such that lookups can be done in O(\log \log m) worst-case time, and updates can be done in O(\log\log m) expected amortized time. By the lower bound from the previous section, this time complexity for lookup is optimal when m=n^ for \gamma\in (1,2]. This implementation is related to the order-maintenance problem and involves vEB trees, one for the entire array and one for each index. Straka showed that the times for both operations can be (slightly) improved to O(\log\log \min(m,n)).


Worst case log-log-time

Straka showed how to achieve O((\log \log m)^2/\log\log\log m) worst-case time and linear (O(m+n)) space, or O(\log\log m) worst-case time and super-linear space. It remains open whether it is possible to achieve worst-case time O(\log\log m) subject to linear space.


References

{{Reflist. Articles with example pseudocode Arrays