Conflict-free Replicated Data Type
   HOME

TheInfoList



OR:

In
distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...
, a conflict-free replicated data type (CRDT) is a
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that is replicated across multiple computers in a
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, with the following features: # The application can update any replica independently, concurrently and without coordinating with other replicas. # An algorithm (itself part of the data type) automatically resolves any inconsistencies that might occur. # Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge. The CRDT concept was formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero and Marek Zawirski. Development was initially motivated by collaborative text editing and
mobile computing Mobile computing is human–computer interaction in which a computer is expected to be transported during normal usage and allow for transmission of data, which can include voice and video transmissions. Mobile computing involves mobile commun ...
. CRDTs have also been used in
online chat Online chat is any direct text-, audio- or video-based (webcams), one-on-one or one-to-many ( group) chat (formally also known as synchronous conferencing), using tools such as instant messengers, Internet Relay Chat (IRC), talkers and possi ...
systems,
online gambling Online gambling (also known as iGaming or iGambling) is any kind of gambling conducted on the internet. This includes virtual poker, casinos, and sports betting. The first online gambling venue opened to the general public was ticketing for th ...
, and in the
SoundCloud SoundCloud is a German audio streaming service owned and operated by SoundCloud Global Limited & Co. KG. The service enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is ...
audio distribution platform. The
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
distributed databases
Redis Redis (; Remote Dictionary Server) is an in-memory key–value database, used as a distributed cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low- latency reads ...
,
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak moved to an entirely open-source project in August 2017, with many of the ...
and Cosmos DB have CRDT data types.


Background

Concurrent updates to multiple replicas of the same data, without coordination between the computers hosting the replicas, can result in inconsistencies between the replicas, which in the general case may not be resolvable. Restoring consistency and data integrity when there are conflicts between updates may require some or all of the updates to be entirely or partially dropped. Accordingly, much of distributed computing focuses on the problem of how to prevent concurrent updates to replicated data. But another possible approach is optimistic replication, where all concurrent updates are allowed to go through, with inconsistencies possibly created, and the results are merged or "resolved" later. In this approach, consistency between the replicas is eventually re-established via "merges" of differing replicas. While optimistic replication might not work in the general case, there is a significant and practically useful class of data structures, CRDTs, where it does work — where it is always possible to merge or resolve concurrent updates on different replicas of the data structure without conflicts. This makes CRDTs ideal for optimistic replication. As an example, a one-way Boolean event flag is a trivial CRDT: one bit, with a value of true or false. True means some particular event has occurred at least once. False means the event has not occurred. Once set to true, the flag cannot be set back to false (an event having occurred cannot un-occur). The resolution method is "true wins": when merging a replica where the flag is true (that replica has observed the event), and another one where the flag is false (that replica hasn't observed the event), the resolved result is true — the event has been observed.


Types of CRDTs

There are two approaches to CRDTs, both of which can provide strong eventual consistency: state-based CRDTs and operation-based CRDTs.


State-based CRDTs

State-based CRDTs (also called ''convergent replicated data types'', or ''CvRDTs'') are defined by two types, a type for local states and a type for actions on the state, together with three functions: A function to produce an ''initial state'', a ''merge'' function of states, and a function to apply an action to ''update'' a state. State-based CRDTs simply send their full local state to other replicas on every update, where the received new state is then merged into the local state. To ensure eventual convergence the functions should fulfill the following properties: The ''merge'' function should compute the
join Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two topo ...
for any pair of replica states, and should form a
semilattice In mathematics, a join-semilattice (or upper semilattice) is a partially ordered set that has a join (a least upper bound) for any nonempty finite subset. Dually, a meet-semilattice (or lower semilattice) is a partially ordered set which has ...
with the initial state as the neutral element. In particular this means, that the merge function must be
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...
,
associative In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement for express ...
, and
idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
. The intuition behind commutativity, associativity and idempotence is that these properties are used to make the CRDT invariant under package re-ordering and duplication. Furthermore, the ''update'' function must be monotone with regard to the
partial order In mathematics, especially order theory, a partial order on a set is an arrangement such that, for certain pairs of elements, one precedes the other. The word ''partial'' is used to indicate that not every pair of elements needs to be comparable ...
defined by said semilattice. ''Delta state'' CRDTs (or simply Delta CRDTs) are optimized state-based CRDTs where only recently applied changes to a state are disseminated instead of the entire state.


Operation-based CRDTs

Operation-based CRDTs (also called ''commutative replicated data types'', or ''CmRDTs'') are defined without a merge function. Instead of transmitting states, the update actions are transmitted directly to replicas and applied. For example, an operation-based CRDT of a single integer might broadcast the operations (+10) or (−20). The application of operations should still be
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...
and
associative In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement for express ...
. However, instead of requiring that application of operations is
idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
, stronger assumptions on the communications infrastructure are expected -- all operations must be delivered to the other replicas without duplication. ''Pure'' operation-based CRDTs are a variant of operation-based CRDTs that reduces the metadata size.


Comparison

The two alternatives are theoretically equivalent, as each can emulate the other. However, there are practical differences. State-based CRDTs are often simpler to design and to implement; their only requirement from the communication substrate is some kind of
gossip protocol A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members ...
. Their drawback is that the entire state of every CRDT must be transmitted eventually to every other replica, which may be costly. In contrast, operation-based CRDTs transmit only the update operations, which are typically small. However, operation-based CRDTs require guarantees from the communication middleware; that the operations are not dropped or duplicated when transmitted to the other replicas, and that they are delivered in causal order. While operations-based CRDTs place more requirements on the protocol for transmitting operations between replicas, they use less bandwidth than state-based CRDTs when the number of transactions is small in comparison to the size of internal state. However, since the state-based CRDT merge function is associative, merging with the state of some replica yields all previous updates to that replica.
Gossip protocol A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members ...
s work well for propagating state-based CRDT state to other replicas while reducing network use and handling topology changes. Some lower bounds on the storage complexity of state-based CRDTs are known.


Known CRDTs


G-Counter (Grow-only Counter)

This state-based CRDT implements a counter for a cluster of ''n'' nodes. Each node in the cluster is assigned an ID from 0 to ''n'' - 1, which is retrieved with a call to ''myId''(). Thus each node is assigned its own slot in the array ''P'', which it increments locally. Updates are propagated in the background, and merged by taking the ''max''() of every element in ''P''. The compare function is included to illustrate a partial order on the states. The merge function is commutative, associative, and idempotent. The update function monotonically increases the internal state according to the compare function. This is thus a correctly defined state-based CRDT and will provide strong eventual consistency. The operations-based CRDT equivalent broadcasts increment operations as they are received.


PN-Counter (Positive-Negative Counter)

A common strategy in CRDT development is to combine multiple CRDTs to make a more complex CRDT. In this case, two G-Counters are combined to create a data type supporting both increment and decrement operations. The "P" G-Counter counts increments; and the "N" G-Counter counts decrements. The value of the PN-Counter is the value of the P counter minus the value of the N counter. Merge is handled by letting the merged P counter be the merge of the two P G-Counters, and similarly for N counters. Note that the CRDT's internal state must increase monotonically, even though its external state as exposed through ''query'' can return to previous values.


G-Set (Grow-only Set)

The G-Set (grow-only set) is a set which only allows adds. An element, once added, cannot be removed. The merger of two G-Sets is their union.


2P-Set (Two-Phase Set)

Two G-Sets (grow-only sets) are combined to create the 2P-set. With the addition of a remove set (called the "tombstone" set), elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an element ''e'' is in the tombstone set, query will never again return True for that element. The 2P-set uses "remove-wins" semantics, so ''remove''(''e'') takes precedence over ''add''(''e'').


LWW-Element-Set (Last-Write-Wins-Element-Set)

LWW-Element-Set is similar to 2P-Set in that it consists of an "add set" and a "remove set", with a timestamp for each element. Elements are added to an LWW-Element-Set by inserting the element into the add set, with a timestamp. Elements are removed from the LWW-Element-Set by being added to the remove set, again with a timestamp. An element is a member of the LWW-Element-Set if it is in the add set, and either not in the remove set, or in the remove set but with an earlier timestamp than the latest timestamp in the add set. Merging two replicas of the LWW-Element-Set consists of taking the union of the add sets and the union of the remove sets. When timestamps are equal, the "bias" of the LWW-Element-Set comes into play. A LWW-Element-Set can be biased towards adds or removals. The advantage of LWW-Element-Set over 2P-Set is that, unlike 2P-Set, LWW-Element-Set allows an element to be reinserted after having been removed.


OR-Set (Observed-Remove Set)

OR-Set resembles LWW-Element-Set, but using unique tags instead of timestamps. For each element in the set, a list of add-tags and a list of remove-tags are maintained. An element is inserted into the OR-Set by having a new unique tag generated and added to the add-tag list for the element. Elements are removed from the OR-Set by having all the tags in the element's add-tag list added to the element's remove-tag (tombstone) list. To merge two OR-Sets, for each element, let its add-tag list be the union of the two add-tag lists, and likewise for the two remove-tag lists. An element is a member of the set if and only if the add-tag list less the remove-tag list is nonempty. An optimization that eliminates the need for maintaining a tombstone set is possible; this avoids the potentially unbounded growth of the tombstone set. The optimization is achieved by maintaining a vector of timestamps for each replica.


Sequence CRDTs

A sequence, list, or
ordered set In mathematics, especially order theory, a partial order on a set is an arrangement such that, for certain pairs of elements, one precedes the other. The word ''partial'' is used to indicate that not every pair of elements needs to be comparable; ...
CRDT can be used to build a
collaborative real-time editor A collaborative real-time editor is a type of collaborative software or web application which enables real-time collaborative editing, simultaneous editing, or live editing of the same digital document, computer file or cloud-stored data – s ...
, as an alternative to operational transformation (OT). Some known Sequence CRDTs are Treedoc, RGA, Woot, Logoot, and LSEQ. CRATE is a decentralized real-time editor built on top of LSEQSplit (an extension of LSEQ) and runnable on a network of browsers using
WebRTC WebRTC (Web Real-Time Communication) is a free and open-source project providing web browsers and mobile applications with real-time communication (RTC) via application programming interfaces (APIs). It allows audio and video communication and ...
. LogootSplit was proposed as an extension of Logoot in order to reduce the metadata for sequence CRDTs. MUTE is an online web-based peer-to-peer real-time collaborative editor relying on the LogootSplit algorithm. Industrial sequence CRDTs, including open-source ones, are known to out-perform academic implementations due to optimizations and a more realistic testing methodology. The main popular example is Yjs CRDT, a pioneer in using a plainlist instead of a tree (ala Kleppmann's ''automerge'').


Industry use

* Fluid Framework is an open-source collaborative platform built by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
that provides both server reference implementations and client-side SDKs for creating modern real-time web applications using CRDTs. * Nimbus Note is a collaborative note-taking application that uses the Yjs CRDT for collaborative editing. *
Redis Redis (; Remote Dictionary Server) is an in-memory key–value database, used as a distributed cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low- latency reads ...
is a distributed, highly available, and scalable in-memory database with a "CRDT-enabled database" feature. *
SoundCloud SoundCloud is a German audio streaming service owned and operated by SoundCloud Global Limited & Co. KG. The service enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is ...
open-sourced Roshi, a LWW-element-set CRDT for the SoundCloud stream implemented on top of Redis. *
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak moved to an entirely open-source project in August 2017, with many of the ...
is a distributed NoSQL key-value data store based on CRDTs. ''
League of Legends ''League of Legends'' (''LoL'', commonly referred to as ''League'', is a multiplayer online battle arena video game developed and published by Riot Games. Inspired by ''Defense of the Ancients'', a Mod (video games), custom map for ''Warcraf ...
'' uses the Riak CRDT implementation for its in-game chat system, which handles 7.5 million concurrent users and 11,000 messages per second. * Bet365 stores hundreds of megabytes of data in the Riak implementation of OR-Set. * TomTom employs CRDTs to synchronize navigation data between the devices of a user. * Phoenix, a web framework written in
Elixir An elixir is a sweet liquid used for medical purposes, to be taken orally and intended to cure one's illness. When used as a dosage form, pharmaceutical preparation, an elixir contains at least one active ingredient designed to be taken orall ...
, uses CRDTs to support real-time multi-node information sharing in version 1.2. *
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
implements CRDTs in their Apollo low-latency "consistency at scale" database. *
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
uses CRDTs in their FlightTracker system for managing the Facebook graph internally. * Teletype for
Atom Atoms are the basic particles of the chemical elements. An atom consists of a atomic nucleus, nucleus of protons and generally neutrons, surrounded by an electromagnetically bound swarm of electrons. The chemical elements are distinguished fr ...
employs CRDTs to enable developers share their workspace with team members and collaborate on code in real time. *
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
implements CRDTs in the Notes app for syncing offline edits between multiple devices. *
Novell, Inc. Novell, Inc. () was an American software and services company headquartered in Provo, Utah, that existed from 1980 until 2014. Its most significant product was the multi-platform network operating system known as NetWare. Novell technolog ...
introduced a state-based CRDT with "loosely consistent" directory replication (NetWare Directory Services), included in NetWare 4.0 in 1995. The successor product, eDirectory, delivered improvements to the replication process.


See also

*
Data synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization a ...
*
Collaborative real-time editor A collaborative real-time editor is a type of collaborative software or web application which enables real-time collaborative editing, simultaneous editing, or live editing of the same digital document, computer file or cloud-stored data – s ...
s *
Consistency model In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be data consistency, consistent and th ...
s * Optimistic replication * Operational transformation * Self-stabilizing algorithms


References

{{cite web , url = https://pages.lip6.fr/Marc.Shapiro/papers/RR-7687.pdf , title = Conflict-free Replicated Data Types , date = July 19, 2011 , publisher = inria.fr


External links


A collection of resources and papers on CRDTs

"Strong Eventual Consistency and Conflict-free Replicated Data Types" (A talk on CRDTs)
by Marc Shapiro

by Christopher Meiklejohn
CAP theorem and CRDTs: CAP 12 years later. How the rules have changed
by Eric Brewer Distributed data structures Distributed algorithms Fault-tolerant computer systems