A version vector is a mechanism for tracking changes to data in a
distributed system
Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers.
The components of a distributed system commun ...
, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (
happened-before
In computer science, the happened-before binary relation, relation (denoted: \to \;) is a relation between the result of two events, such that if one event should happen before another event, the result must reflect that, even if those events are i ...
), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable
causality tracking among data replicas and are a basic mechanism for
optimistic replication. In mathematical terms, the version vector generates a
preorder
In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive relation, reflexive and Transitive relation, transitive. The name is meant to suggest that preorders are ''almost'' partial orders, ...
that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a
vector clock
A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's ...
, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
* Initially all vector counters are zero.
* Each time a replica experiences a local update event, it increments its own counter in the vector by one.
* Each time two replicas and synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters:
. After synchronization, the two replicas have identical version vectors.
Pairs of replicas, , , can be compared by inspecting their version vectors and determined to be either: identical (
), concurrent (
), or ordered (
or
). The ordered relation is defined as: Vector
if and only if every element of
is less than or equal to its corresponding element in
, and at least one of the elements is strictly less than. If neither
or
, but the vectors are not identical, then the two vectors must be concurrent.
Version vectors or variants are used to track updates in many distributed file systems, such as
Coda (file system)
Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File System (AFS-2) and offers many ...
and Ficus, and are the main data structure behind optimistic replication.
Other mechanisms
* Hash Histories avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
* Concise Version Vectors allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
* Version Stamps allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
* Interval Tree Clocks generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
* Bounded Version Vectors allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
* Dotted Version Vectors
[Nuno Preguiça, Carlos Baquero, Paulo Almeida, Victor Fonte and Ricardo Gonçalves. Brief Announcement: Efficient Causality Tracking in Distributed Storage Systems With Dotted Version Vectors. ACM PODC, pp. 335-336, 2012.] address scalability with a small set of servers mediating replica access by a large number of concurrent clients.
References
{{reflist
External links
Why Logical Clocks are Easy (Compares Causal Histories, Vector Clocks and Version Vectors)
Data synchronization
Logical clock algorithms
Distributed computing problems