Reliability (computer networking)
   HOME

TheInfoList



OR:

In
computer networking A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
, a reliable protocol is a
communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the ITU and
ATM Forum The ATM Forum was founded in 1991 to be the industry consortium to promote Asynchronous Transfer Mode technology used in telecommunication networks; the founding president and chairman was Fred Sammartino of Sun Microsystems. It was a non-profit in ...
. Reliable protocols typically incur more overhead than unreliable protocols, and as a result, function more slowly and with less scalability. This often is not an issue for unicast protocols, but it may become a problem for reliable multicast protocols.
Transmission Control Protocol The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is common ...
(TCP), the main protocol used on the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
, is a reliable unicast protocol; it provides the abstraction of a reliable byte stream to applications. UDP is an unreliable protocol and is often used in computer games, streaming media or in other situations where speed is an issue and some data loss may be tolerated because of the transitory nature of the data. Often, a reliable unicast protocol is also connection oriented. For example, TCP is connection oriented, with the virtual-circuit ID consisting of source and destination
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
es and port numbers. However, some unreliable protocols are connection oriented, such as
Asynchronous Transfer Mode Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by American National Standards Institute (ANSI) and ITU-T (formerly CCITT) for digital transmission of multiple types of traffic. ATM was developed to meet the needs of ...
and
Frame Relay Frame Relay is a standardized wide area network (WAN) technology that specifies the physical and data link layers of digital telecommunications channels using a packet switching methodology. Originally designed for transport across Integrated Se ...
. In addition, some connectionless protocols, such as
IEEE 802.11 IEEE 802.11 is part of the IEEE 802 set of local area network (LAN) technical standards, and specifies the set of media access control (MAC) and physical layer (PHY) protocols for implementing wireless local area network (WLAN) computer commun ...
, are reliable.


History

Building on the
packet switching In telecommunications, packet switching is a method of grouping data into '' packets'' that are transmitted over a digital network. Packets are made of a header and a payload. Data in the header is used by networking hardware to direct the p ...
concepts proposed by
Donald Davies Donald Watts Davies, (7 June 1924 – 28 May 2000) was a Welsh computer scientist who was employed at the UK National Physical Laboratory (NPL). In 1965 he conceived of packet switching, which is today the dominant basis for data communic ...
, the first
communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
on the
ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first networks to implement the TCP/IP protocol suite. Both technologies became the technical fou ...
was a reliable packet delivery procedure to connect its hosts via the 1822 interface. A host computer simply arranged the data in the correct packet format, inserted the address of the destination host computer, and sent the message across the interface to its connected
Interface Message Processor The Interface Message Processor (IMP) was the packet switching node used to interconnect participant networks to the ARPANET from the late 1960s to 1989. It was the first generation of gateways, which are known today as routers. An IMP was a ...
(IMP). Once the message was delivered to the destination host, an acknowledgment was delivered to the sending host. If the network could not deliver the message, the IMP would send an error message back to the sending host. Meanwhile, the developers of
CYCLADES The Cyclades (; el, Κυκλάδες, ) are an island group in the Aegean Sea, southeast of mainland Greece and a former administrative prefecture of Greece. They are one of the island groups which constitute the Aegean archipelago. The name ...
and of
ALOHAnet ALOHAnet, also known as the ALOHA System, or simply ALOHA, was a pioneering computer networking system developed at the University of Hawaii. ALOHAnet became operational in June 1971, providing the first public demonstration of a wireless packe ...
demonstrated that it was possible to build an effective computer network without providing reliable packet transmission. This lesson was later embraced by the designers of
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
. If a network does not guarantee packet delivery, then it becomes the host's responsibility to provide reliability by detecting and retransmitting lost packets. Subsequent experience on the ARPANET indicated that the network itself could not reliably detect all packet delivery failures, and this pushed responsibility for error detection onto the sending host in any case. This led to the development of the
end-to-end principle The end-to-end principle is a design framework in computer networking. In networks designed according to this principle, guaranteeing certain application-specific features, such as reliability and security, requires that they reside in the commu ...
, which is one of the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
's fundamental design principles.


Reliability properties

A reliable service is one that notifies the user if delivery fails, while an ''unreliable'' one does not notify the user if delivery fails. For example,
Internet Protocol The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. ...
(IP) provides an unreliable service. Together,
Transmission Control Protocol The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is common ...
(TCP) and IP provide a reliable service, whereas
User Datagram Protocol In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in packets) to other hosts on an Internet Protocol (IP) network ...
(UDP) and IP provide an unreliable one. In the context of distributed protocols, reliability properties specify the guarantees that the protocol provides with respect to the delivery of messages to the intended recipient(s). An example of a reliability property for a unicast protocol is "at least once", i.e. at least one copy of the message is guaranteed to be delivered to the recipient. Reliability properties for multicast protocols can be expressed on a per-recipient basis (simple reliability properties), or they may relate the fact of delivery or the order of delivery among the different recipients (strong reliability properties). In the context of multicast protocols, strong reliability properties express the guarantees that the protocol provides with respect to the delivery of messages to different recipients. An example of a strong reliability property is ''last copy recall'', meaning that as long as at least a single copy of a message remains available at any of the recipients, every other recipient that does not fail eventually also receives a copy. Strong reliability properties such as this one typically require that messages are retransmitted or forwarded among the recipients. An example of a reliability property stronger than ''last copy recall'' is atomicity. The property states that if at least a single copy of a message has been delivered to a recipient, all other recipients will eventually receive a copy of the message. In other words, each message is always delivered to either all or none of the recipients. One of the most complex strong reliability properties is
virtual synchrony A reliable multicast is any computer networking protocol that provides a ''Reliability (computer networking), reliable'' sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file tran ...
. Reliable messaging is the concept of
message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its support ...
across an unreliable infrastructure whilst being able to make certain guarantees about the successful transmission of the messages. For example, that if the message is delivered, it is delivered at most once, or that all messages successfully delivered arrive in a particular order. Reliable delivery can be contrasted with
best-effort delivery Best-effort delivery describes a network service in which the network does ''not'' provide any guarantee that data is delivered or that delivery meets any quality of service. In a best-effort network, all users obtain best-effort service. Under b ...
, where there is no guarantee that messages will be delivered quickly, in order, or at all.


Implementations

A reliable delivery protocol can be built on an unreliable protocol. An extremely common example is the layering of
Transmission Control Protocol The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is common ...
on the
Internet Protocol The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. ...
, a combination known as
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
. Strong reliability properties are offered by
group communication system {{Unreferenced, date=April 2021 The term Group Communication System (GCS) refers to a software platform that implements some form of group communication. Examples of group communication systems include IS-IS, Spread Toolkit The Spread Toolkit ...
s (GCSs) such as
IS-IS Intermediate System to Intermediate System (IS-IS, also written ISIS) is a routing protocol designed to move information efficiently within a computer network, a group of physically connected computers or similar devices. It accomplishes this b ...
, Appia framework,
Spread Spread may refer to: Places * Spread, West Virginia Arts, entertainment, and media * ''Spread'' (film), a 2009 film. * ''$pread'', a quarterly magazine by and for sex workers * "Spread", a song by OutKast from their 2003 album ''Speakerboxxx/T ...
, JGroups or QuickSilver Scalable Multicast. The QuickSilver Properties Framework is a flexible platform that allows strong reliability properties to be expressed in a purely declarative manner, using a simple rule-based language, and automatically translated into a hierarchical protocol. One protocol that implements reliable messaging is
WS-ReliableMessaging WS-ReliableMessaging describes a protocol that allows SOAP messages to be reliably delivered between distributed applications in the presence of software component, system, or network failures. The original specification was written by BEA Syste ...
, which handles reliable delivery of
SOAP Soap is a salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, soaps are use ...
messages. The ATM Service-Specific Coordination Function provides for transparent assured delivery with AAL5.ATM Forum, The User Network Interface (UNI), v. 3.1, , Prentice Hall PTR, 1995.ITU-T, ''B-ISDN ATM Adaptation Layer specification: Type 5 AAL'', Recommendation I.363.5, International Telecommunication Union, 1998.
IEEE 802.11 IEEE 802.11 is part of the IEEE 802 set of local area network (LAN) technical standards, and specifies the set of media access control (MAC) and physical layer (PHY) protocols for implementing wireless local area network (WLAN) computer commun ...
attempts to provide reliable service for all traffic. The sending station will resend a frame if the sending station doesn't receive an ACK frame within a predetermined period of time.


Real-time systems

There is, however, a problem with the definition of reliability as "delivery or notification of failure" in
real-time computing Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constra ...
. In such systems, failure to deliver the real-time data will adversely affect the performance of the systems, and some systems, e.g.
safety-critical A safety-critical system (SCS) or life-critical system is a system whose failure or malfunction may result in one (or more) of the following outcomes: * death or serious injury to people * loss or severe damage to equipment/property * environme ...
, safety-involved, and some secure mission-critical systems, must be proved to perform at some specified minimum level. This, in turn, requires that a specified minimum reliability for the delivery of the critical data be met. Therefore, in these cases, it is only the delivery that matters; notification of the failure to deliver does ameliorate the failure. In hard real-time systems, all data must be delivered by the deadline or it is considered a system failure. In firm real-time systems, late data is still valueless but the system can tolerate some amount of late or missing data.S., Schneider, G.,Pardo-Castellote, M., Hamilton. “Can Ethernet Be Real Time?”, Real-Time Innovations, Inc., 2001Dan Rubenstein, Jim Kurose, Don Towsley, ”Real-Time Reliable Multicast Using Proactive Forward Error Correction”, NOSSDAV ’98 There are a number of protocols that are capable of addressing real-time requirements for reliable delivery and timeliness:
MIL-STD-1553B MIL-STD-1553 is a military standard published by the United States Department of Defense that defines the mechanical, electrical, and functional characteristics of a serial data bus. It was originally designed as an avionic data bus for use with ...
and STANAG 3910 are well-known examples of such timely and reliable protocols for avionic data buses. MIL-1553 uses a 1 Mbit/s shared media for the transmission of data and the control of these transmissions, and is widely used in federated military
avionics Avionics (a blend of ''aviation'' and ''electronics'') are the electronic systems used on aircraft. Avionic systems include communications, navigation, the display and management of multiple systems, and the hundreds of systems that are fit ...
systems. It uses a bus controller (BC) to command the connected remote terminals (RTs) to receive or transmit this data. The BC can, therefore, ensure that there will be no congestion, and transfers are always timely. The MIL-1553 protocol also allows for automatic retries that can still ensure timely delivery and increase the reliability above that of the physical layer. STANAG 3910, also known as EFABus in its use on the
Eurofighter Typhoon The Eurofighter Typhoon is a European multinational twin-engine, canard delta wing, multirole fighter. The Typhoon was designed originally as an air-superiority fighter and is manufactured by a consortium of Airbus, BAE Systems and Leonardo ...
, is, in effect, a version of MIL-1553 augmented with a 20 Mbit/s shared media bus for data transfers, retaining the 1 Mbit/s shared media bus for control purposes. The
Asynchronous Transfer Mode Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by American National Standards Institute (ANSI) and ITU-T (formerly CCITT) for digital transmission of multiple types of traffic. ATM was developed to meet the needs of ...
(ATM), the
Avionics Full-Duplex Switched Ethernet Avionics Full-Duplex Switched Ethernet (AFDX), also ARINC 664, is a data network, patented by international aircraft manufacturer Airbus, for safety-critical applications that utilizes dedicated bandwidth while providing deterministic quality of ...
(AFDX), and Time Triggered Ethernet (TTEthernet) are examples of packet-switched networks protocols where the timeliness and reliability of data transfers can be assured by the network. AFDX and TTEthernet are also based on IEEE 802.3 Ethernet, though not entirely compatible with it. ATM uses connection-oriented
virtual channel In most telecommunications organizations, a virtual channel is a method of remapping the ''program number'' as used in H.222 Program Association Tables and Program Mapping Tables to a channel number that can be entered via digits on a receiver's ...
s (VCs) which have fully deterministic paths through the network, and usage and network parameter control (UPC/NPC), which are implemented within the network, to limit the traffic on each VC separately. This allows the usage of the shared resources (switch buffers) in the network to be calculated from the parameters of the traffic to be carried in advance, i.e. at system design time. That they are implemented by the network means that these calculations remain valid even when other users of the network behave in unexpected ways, i.e. transmit more data than they are expected to. The calculated usages can then be compared with the capacities of these resources to show that, given the constraints on the routes and the bandwidths of these connections, the resource used for these transfers will never be over-subscribed. These transfers will therefore never be affected by congestion and there will be no losses due to this effect. Then, from the predicted maximum usages of the switch buffers, the maximum delay through the network can also be predicted. However, for the reliability and timeliness to be proved, and for the proofs to be tolerant of faults in and malicious actions by the equipment connected to the network, the calculations of these resource usages cannot be based on any parameters that are not actively enforced by the network, i.e. they cannot be based on what the sources of the traffic are expected to do or on statistical analyses of the traffic characteristics (see network calculus). AFDX uses frequency domain bandwidth allocation and traffic policing, that allows the traffic on each virtual link (VL) to be limited so that the requirements for shared resources can be predicted and congestion prevented so it can be proved not to affect the critical data. However, the techniques for predicting the resource requirements and proving that congestion is prevented are not part of the AFDX standard. TTEthernet provides the lowest possible latency in transferring data across the network by using time-domain control methods – each time triggered transfer is scheduled at a specific time so that contention for shared resources is controlled and thus the possibility of congestion is eliminated. The switches in the network enforce this timing to provide tolerance of faults in, and malicious actions on the part of, the other connected equipment. However, "synchronized local clocks are the fundamental prerequisite for time-triggered communication".Wilfried Steiner and Bruno Dutertre
''SMT-Based Formal Verification of a ''TTEthernet'' Synchronization Function''
S. Kowalewski and M. Roveri (Eds.), FMICS 2010, LNCS 6371, pp. 148–163, 2010.
This is because the sources of critical data will have to have the same view of time as the switch, in order that they can transmit at the correct time and the switch will see this as correct. This also requires that the sequence with which a critical transfer is scheduled has to be predictable to both source and switch. This, in turn, will limit the transmission schedule to a highly deterministic one, e.g. the cyclic executive. However, low latency in transferring data over the bus or network does not necessarily translate into low transport delays between the application processes that source and sink this data. This is especially true where the transfers over the bus or network are cyclically scheduled (as is commonly the case with MIL-STD-1553B and STANAG 3910, and necessarily so with AFDX and TTEthernet) but the application processes are not synchronized with this schedule. With both AFDX and TTEthernet, there are additional functions required of the interfaces, e.g. AFDX's Bandwidth Allocation Gap control, and TTEthernet's requirement for very close synchronization of the sources of time-triggered data, that make it difficult to use standard Ethernet interfaces. Other methods for control of the traffic in the network that would allow the use of such standard IEEE 802.3 network interfaces is a subject of current research.


References

{{reflist Network protocols Reliability engineering