The content-addressable network (CAN) is a distributed, decentralized
P2P infrastructure that provides
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
functionality on an
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
-like scale. CAN was one of the original four
distributed hash table
A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
proposals, introduced concurrently with
Chord,
Pastry
Pastry refers to a variety of Dough, doughs (often enriched with fat or eggs), as well as the sweet and savoury Baking, baked goods made from them. The dough may be accordingly called pastry dough for clarity. Sweetened pastries are often descr ...
, and
Tapestry.
Overview
Like other distributed hash tables, CAN is designed to be
scalable,
fault tolerant, and
self-organizing
Self-organization, also called spontaneous order in the social sciences, is a process where some form of overall order and disorder, order arises from local interactions between parts of an initially disordered system. The process can be spont ...
. The architectural design is a virtual multi-dimensional
Cartesian coordinate space
In mathematics and physics, a vector space (also called a linear space) is a set (mathematics), set whose elements, often called vector (mathematics and physics), ''vectors'', can be added together and multiplied ("scaled") by numbers called sc ...
, a type of
overlay network
An overlay network is a logical computer network that is protocol layering, layered on top of a physical network. The concept of overlay networking is distinct from the traditional model of OSI model, OSI layered networks, and almost always assum ...
, on a multi-
torus
In geometry, a torus (: tori or toruses) is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanarity, coplanar with the circle. The main types of toruses inclu ...
. This ''n''-dimensional coordinate space is a
virtual logical address
In computing, a logical address is the address at which an item ( memory cell, storage element, network host) appears to reside from the perspective of an executing application program.
A logical address may be different from the physical addr ...
, completely independent of the physical location and physical connectivity of the nodes.
Points within the space are identified with coordinates. The entire coordinate space is dynamically partitioned among all the nodes in the system such that every node possesses at least one distinct zone within the overall space.
Routing
A CAN node maintains a
routing table
In computer networking, a routing table, or routing information base (RIB), is a data table stored in a router or a network host that lists the routes to particular network destinations, and in some cases, metrics (distances) associated wi ...
that holds the
IP address
An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
and virtual coordinate zone of each of its neighbors. A node routes a message towards a destination point in the coordinate space. The node first determines which neighboring zone is closest to the destination point, and then looks up that zone's node's IP address via the routing table.
Node joining
To join a CAN, a joining node must:
# Find a node already in the overlay network.
# Identify a zone that can be split
# Update the routing tables of nodes neighboring the newly split zone.
To find a node already in the overlay network,
bootstrapping node A bootstrapping node, also known as a rendezvous host, is a node in an overlay network that provides initial configuration information to newly joining nodes so that they may successfully join the overlay network. Bootstrapping nodes are predomina ...
s may be used to inform the joining node of IP addresses of nodes currently in the overlay network.
After the joining node receives an IP address of a node already in the CAN, it can attempt to identify a zone for itself. The joining node randomly picks a point in the coordinate space and sends a join request, directed to the random point, to one of the received IP addresses. The nodes already in the overlay network route the join request to the correct device via their zone-to-IP routing tables. Once the node managing the destination point's zone receives the join request, it may honor the join request by splitting its zone in half, allocating itself the first half, and allocating the joining node the second half. If it does not honor the join request, the joining node keeps picking random points in the coordinate space and sending join requests directed to these random points until it successfully joins the network.
After the zone split and allocation is complete, the neighboring nodes are updated with the coordinates of the two new zones and the corresponding IP addresses. Routing tables are updated and updates are propagated across the network.
Node departing
To handle a node departing, the CAN must
# identify a node is departing
# have the departing node's zone merged or taken over by a neighboring node
# update the routing tables across the network.
Detecting a node's departure can be done, for instance, via heartbeat messages that periodically broadcast routing table information between neighbors. After a predetermined period of silence from a neighbor, that neighboring node is determined as failed and is considered a departing node.
Alternatively, a node that is willingly departing may broadcast such a notice to its neighbors.
After a departing node is identified, its zone must be either merged or taken over. First the departed node's zone is analyzed to determine whether a neighboring node's zone can merge with the departed node's zone to form a valid zone. For example, a zone in a 2D coordinate space must be either a square or rectangle and cannot be L-shaped. The validation test may cycle through all neighboring zones to determine if a successful merge can occur. If one of the potential merges is deemed a valid merge, the zones are then merged. If none of the potential merges are deemed valid, then the neighboring node with the smallest zone takes over control of the departing node's zone.
After a take-over, the take-over node may periodically attempt to merge its additionally controlled zones with respective neighboring zones.
If the merge is successful, routing tables of neighboring zones' nodes are updated to reflect the merge. The network will see the subsection of the overlay network as one, single zone after a merge and treat all routing processing with this mindset. To effectuate a take-over, the take-over node updates neighboring zones' nodes' routing tables, so that requests to either zone resolve to the take-over node. And, as such, the network still sees the subsection of the overlay network as two separate zones and treats all routing processing with this mindset.
Developers
Sylvia Ratnasamy
Sylvia Ratnasamy (born 1976) is a Belgian-Indian computer scientist. She is best known as one of the inventors of the distributed hash table (DHT). Her doctoral dissertation proposed the content-addressable networks, one of the original DHTs, a ...
, Paul Francis,
Mark Handley,
Richard Karp
Richard Manning Karp (born January 3, 1935) is an American computer scientist and computational theorist at the University of California, Berkeley. He is most notable for his research in the theory of algorithms, for which he received a Turin ...
,
Scott Shenker
Scott J. Shenker (born January 24, 1956) is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science ...
See also
*
Chord
*
Content-addressable storage The content-addressable network (CAN) is a distributed, decentralized Peer-to-peer, P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN was one of the original four distributed hash table proposals, introduced c ...
*
Pastry
Pastry refers to a variety of Dough, doughs (often enriched with fat or eggs), as well as the sweet and savoury Baking, baked goods made from them. The dough may be accordingly called pastry dough for clarity. Sweetened pastries are often descr ...
*
Tapestry
References
{{reflist
Routing
Distributed data storage
Peer-to-peer computing
Hash-based data structures