computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, and more precisely regarding

data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...

s, a persistent array is a

persistent data structure In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (v ...

with properties similar to a (non-persistent)

array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...

. That is, after a value's update in a persistent array, there exist two persistent arrays: one persistent array in which the update is taken into account, and one which is equal to the array before the update.

Difference between persistent arrays and arrays

An array

\mathrm=_0,\dots,e_/math> is a data structure,
with a fixed number ''n'' of elements e_0, \dots,
e_. It is expected that, given the array ''ar'' and an
index 0\le i, the value e_i can be
retrieved quickly. This operation is called a
lookup. Furthermore, given the array ''ar'', an index 0\le i and a new value ''v'', a new array ''ar2'' with
content_0,\dots,e_,v,e_,\dots,e_/math> can
be created quickly. This operation is called an update.  The
main difference between persistent and non-persistent arrays being
that, in non-persistent arrays, the array ''ar'' is destroyed during
the creation of ''ar2''.

For example, consider the following pseudocode.
 ''array'' =

, 0, 0 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

''updated_array'' = ''array''.update(0, 8) ''other_array'' = ''array''.update(1, 3) ''last_array'' = ''updated_array''.update(2, 5) At the end of execution, the value of ''array'' is still

the value of ''updated_array'' is

the value of ''other_array'' is , 3, 0 and the value of ''last_array'' is

, 0, 5 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

There exist two kinds of persistent arrays. A persistent array may be either partially or fully persistent. A fully persistent array may be updated an arbitrary number of times while a partially persistent array may be updated at most once. In our previous example, if ''array'' were only partially persistent, the creation of ''other_array'' would be forbidden; however, the creation of ''last_array'' would still be valid. Indeed, ''updated_array'' is an array distinct from ''array'' and has never been updated before the creation of ''last_array''.

Lower Bound on Persistent Array Lookup Time

Given that non-persistent arrays support both updates and lookups in constant time, it is natural to ask whether the same is possible with persistent arrays. The following theorem shows that under mild assumptions about the space complexity of the array, lookups must take

\Omega(\log \log n)

time in the worst case, regardless of update time, in the

cell-probe model In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure probl ...

Implementations

In this section,

n

is the number of elements of the array, and

m

is the number of updates.

Worst case log-time

The most straightforward implementation of a fully persistent array uses an arbitrary persistent map, whose keys are the numbers from ''0'' to ''n'' − 1. A persistent map may be implemented using a persistent

balanced tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...

, in which case both updates and lookups would take

O(\log n)

time. This implementation is optimal for the

pointer machine In theoretical computer science, a pointer machine is an atomistic abstract computational machine whose storage structure is a graph. A pointer algorithm could also be an algorithm restricted to the pointer machine model. Some particular types o ...

model.

Shallow binding

A fully persistent array may be implemented using an array and the so-called Baker's trick. This implementation is used in the

OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...

module parray.ml by Jean-Christophe Filliâtre. In order to define this implementation, a few other definitions must be given. An initial array is an array that is not generated by an update on another array. A child of an array ''ar'' is an array of the form ''ar.update(i,v)'', and ''ar'' is the parent of ''ar.update(i,v)''. A descendant of an array ''ar'' is either ''ar'' or the descendant of a child of ''ar''. The initial array of an array ''ar'' is either ''ar'' if ''ar'' is initial, or it is the initial array of the parent of ''ar''. That is, the initial array of ''ar'' is the unique array ''init'' such that

\mathrm =
init.update(i_0,v_0).\dots.update(i_m,v_m)

, with ''ar'' initial and

i_0,\dots,i_m

an arbitrary sequence of indexes and

v_0,\dots,v_m

an arbitrary sequence of value. A ''family'' of arrays is thus a set of arrays containing an initial array and all of its descendants. Finally, the tree of a family of arrays is the

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...

whose nodes are the arrays, and with an edge ''e'' from ''ar'' to each of its children ''ar.update(i,v)''. A persistent array using Baker's trick consists of a pair with an actual array called ''array'' and the tree of arrays. This tree admits an arbitrary root - not necessarily the initial array. The root may be moved to an arbitrary node of the tree. Changing the root from ''root'' to an arbitrary node ''ar'' takes time proportional to the depth of ''ar''. That is, in the distance between ''root'' and ''ar''. Similarly, looking up a value takes time proportional to the distance between the array and the root of its family. Thus, if the same array ''ar'' may be lookup multiple times, it is more efficient to move the root to ''ar'' before doing the lookup. Finally updating an array only takes

constant time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...

. Technically, given two adjacent arrays ''ar1'' and ''ar2'', with ''ar1'' closer to the root than ''ar2'', the edge from ''ar1'' to ''ar2'' is labelled by ''(i,ar2 '', where ''i'' the only position whose value differ between ''ar1'' and ''ar2''. Accessing an element ''i'' of an array ''ar'' is done as follows. If ''ar'' is the root, then ''ar ' equals ''root '. Otherwise, let ''e'' the edge leaving ''ar'' toward the root. If the label of ''e'' is ''(i,v)'' then ''ar ' equals ''v''. Otherwise, let ''ar2'' be the other node of the edge ''e''. Then ''ar ' equals ''ar2 '. The computation of ''ar2 ' is done recursively using the same definition. The creation of ''ar.update(i,v)'' consists in adding a new node ''ar2'' to the tree, and an edge ''e'' from ''ar'' to ''ar2'' labelled by ''(i,v)''. Finally, moving the root to a node ''ar'' is done as follows. If ''ar'' is already the root, there is nothing to do. Otherwise, let ''e'' the edge leaving ''ar'' toward the current root, ''(i,v)'' its label and ''ar2'' the other end of ''e''. Moving the root to ''ar'' is done by first moving the root to ''ar2'', changing the label of ''e'' to ''(i, ar2 '', and changing ''array ' to ''v''. Updates take

O(1)

time. Lookups take

O(1)

time if the root is the array being looked up, but

\Theta(m)

time in the worst case.

Expected amortized log-log-time

In 1989, Dietz gave an implementation of fully persistent arrays using

O(m+n)

space such that lookups can be done in

O(\log \log m)

worst-case time, and updates can be done in

O(\log\log m)

expected amortized time. By the lower bound from the previous section, this time complexity for lookup is optimal when

m=n^

for

\gamma\in (1,2]

. This implementation is related to the order-maintenance problem and involves vEB trees, one for the entire array and one for each index. Straka showed that the times for both operations can be (slightly) improved to

O(\log\log \min(m,n))

Worst case log-log-time

Straka showed how to achieve

O((\log \log m)^2/\log\log\log m)

worst-case time and linear (

O(m+n)

) space, or

O(\log\log m)

worst-case time and super-linear space. It remains open whether it is possible to achieve worst-case time

O(\log\log m)

subject to linear space.

References

{{Reflist. Articles with example pseudocode Arrays