The Transmission Control Protocol (TCP) is one of the main
protocols of the
Internet protocol suite
The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...
. It originated in the initial network implementation in which it complemented the
Internet Protocol
The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet.
IP ...
(IP). Therefore, the entire suite is commonly referred to as
TCP/IP
The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...
. TCP provides
reliable, ordered, and
error-checked delivery of a
stream
A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a strea ...
of
octets
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
(bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the
World Wide Web
The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
, email,
remote administration, and
file transfer
File transfer is the transmission of a computer file through a communication channel from one computer system to another. Typically, file transfer is mediated by a communications protocol. In the history of computing, numerous file transfer protoc ...
rely on TCP, which is part of the
transport layer
In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end c ...
of the TCP/IP suite.
SSL/TLS often runs on top of TCP.
TCP is
connection-oriented, meaning that sender and receiver firstly need to establish a connection based on agreed parameters; they do this through three-way
handshake
A handshake is a globally widespread, brief greeting or parting tradition in which two people grasp one of each other's hands, and in most cases, it is accompanied by a brief up-and-down movement of the grasped hands. Customs surrounding hands ...
procedure. The server must be listening (passive open) for connection requests from clients before a connection is established. Three-way handshake (active open),
retransmission, and error detection adds to reliability but lengthens
latency. Applications that do not require reliable
data stream
In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded signals to convey information. Typically, the transmitted symbols are grouped into a series of packets.
Data streaming has become u ...
service may use the
User Datagram Protocol
In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in Network packet, packets) to other hosts on an Internet Protoco ...
(UDP) instead, which provides a
connectionless datagram
A datagram is a basic transfer unit associated with a packet-switched network. Datagrams are typically structured in header and payload sections. Datagrams provide a connectionless communication service across a packet-switched network. The de ...
service that prioritizes time over reliability. TCP employs
network congestion avoidance. However, there are vulnerabilities in TCP, including
denial of service
In computing, a denial-of-service attack (DoS attack) is a cyberattack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host co ...
,
connection hijacking, TCP veto, and
reset attack.
Historical origin
In May 1974,
Vint Cerf
Vinton Gray Cerf (; born June 23, 1943) is an American Internet pioneer and is recognized as one of "the fathers of the Internet", sharing this title with TCP/IP co-developer Robert Kahn.
He has received honorary degrees and awards that inclu ...
and
Bob Kahn
Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hea ...
described an
internetworking
Internetworking is the practice of interconnecting multiple computer networks. Typically, this enables any pair of hosts in the connected networks to exchange messages irrespective of their hardware-level networking technology. The resulting sys ...
protocol for sharing resources using
packet switching
In telecommunications, packet switching is a method of grouping Data (computing), data into short messages in fixed format, i.e. ''network packet, packets,'' that are transmitted over a digital Telecommunications network, network. Packets consi ...
among network nodes. The authors had been working with
Gérard Le Lann to incorporate concepts from the French
CYCLADES
The CYCLADES computer network () was a French research network created in the early 1970s. It was one of the pioneering networks experimenting with the concept of packet switching and, unlike the ARPANET, was explicitly designed to facilitate i ...
project into the new network. The
specification
A specification often refers to a set of documented requirements to be satisfied by a material, design, product, or service. A specification is often a type of technical standard.
There are different types of technical or engineering specificati ...
of the resulting protocol, (''Specification of Internet Transmission Control Program''), was written by Vint Cerf,
Yogen Dalal, and Carl Sunshine, and published in December 1974. It contains the first attested use of the term ''internet'', as a shorthand for ''internetwork''.
The Transmission Control Program incorporated both connection-oriented links and datagram services between hosts. In version 4, the monolithic Transmission Control Program was divided into a modular architecture consisting of the ''Transmission Control Protocol'' and the ''Internet Protocol''.
[ "See Abbate, ''Inventing the Internet'', 129–30; ; and "] This resulted in a networking model that became known informally as ''TCP/IP'', although formally it was variously referred to as the ''DoD internet architecture model'' (''DoD model'' for short) or ''DARPA model''.
Later, it became the part of, and synonymous with, the ''Internet Protocol Suite''.
The following
Internet Experiment Note (IEN) documents describe the evolution of TCP into the modern version:
* IEN 5 ''Specification of Internet Transmission Control Program TCP Version 2 (''March 1977).
* IEN 21 ''Specification of Internetwork Transmission Control Program TCP Version 3 (''January 1978).
* IEN 27
* IEN 40
* IEN 44
* IEN 55
* IEN 81
* IEN 112
* IEN 124
TCP was standardized in January 1980 as RFC 761.
In 2004,
Vint Cerf
Vinton Gray Cerf (; born June 23, 1943) is an American Internet pioneer and is recognized as one of "the fathers of the Internet", sharing this title with TCP/IP co-developer Robert Kahn.
He has received honorary degrees and awards that inclu ...
and
Bob Kahn
Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hea ...
received the
Turing Award
The ACM A. M. Turing Award is an annual prize given by the Association for Computing Machinery (ACM) for contributions of lasting and major technical importance to computer science. It is generally recognized as the highest distinction in the fi ...
for their foundational work on TCP/IP.
Network function
The Transmission Control Protocol provides a communication service at an intermediate level between an application program and the Internet Protocol. It provides host-to-host connectivity at the
transport layer
In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end c ...
of the
Internet model. An application does not need to know the particular mechanisms for sending data via a link to another host, such as the required
IP fragmentation
400px, An example of the fragmentation of a protocol data unit in a given layer into smaller fragments
IP fragmentation is an Internet Protocol (IP) process that breaks packets into smaller pieces (fragments), so that the resulting pieces can p ...
to accommodate the
maximum transmission unit
In computer networking, the maximum transmission unit (MTU) is the size of the largest protocol data unit (PDU) that can be communicated in a single network layer transaction. The MTU relates to, but is not identical to the maximum frame size tha ...
of the transmission medium. At the transport layer, TCP handles all handshaking and transmission details and presents an abstraction of the network connection to the application typically through a
network socket
A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming ...
interface.
At the lower levels of the protocol stack, due to
network congestion
Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of ...
, traffic
load balancing, or unpredictable network behavior, IP packets may be
lost, duplicated, or
delivered out of order. TCP detects these problems, requests
re-transmission of lost data, rearranges out-of-order data and even helps minimize network congestion to reduce the occurrence of the other problems. If the data still remains undelivered, the source is notified of this failure. Once the TCP receiver has reassembled the sequence of octets originally transmitted, it passes them to the receiving application. Thus, TCP
abstracts the application's communication from the underlying networking details.
TCP is used extensively by many internet applications, including the
World Wide Web
The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
(WWW), email,
File Transfer Protocol
The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and d ...
,
Secure Shell
The Secure Shell Protocol (SSH Protocol) is a cryptographic network protocol for operating network services securely over an unsecured network. Its most notable applications are remote login and command-line execution.
SSH was designed for ...
,
peer-to-peer file sharing
Peer-to-peer file sharing is the distribution and sharing of digital media using peer-to-peer (P2P) networking technology. P2P file sharing allows users to access media files such as books, music, movies, and games using a P2P software program th ...
, and
streaming media
Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
.
TCP is optimized for accurate delivery rather than timely delivery and can incur relatively long delays (on the order of seconds) while waiting for out-of-order messages or re-transmissions of lost messages. Therefore, it is not particularly suitable for real-time applications such as
voice over IP
Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...
. For such applications, protocols like the
Real-time Transport Protocol
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applic ...
(RTP) operating over the
User Datagram Protocol
In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in Network packet, packets) to other hosts on an Internet Protoco ...
(UDP) are usually recommended instead.
TCP is a
reliable byte stream
A reliable byte stream is a common service paradigm in computer networking; it refers to a byte stream in which the bytes which emerge from the communication channel at the recipient are exactly the same, and in exactly the same order, as they we ...
delivery service that guarantees that all bytes received will be identical and in the same order as those sent. Since packet transfer by many networks is not reliable, TCP achieves this using a technique known as ''positive acknowledgment with re-transmission''. This requires the receiver to respond with an
acknowledgment message as it receives the data. The sender keeps a record of each packet it sends and maintains a timer from when the packet was sent. The sender re-transmits a packet if the timer expires before receiving the acknowledgment. The timer is needed in case a packet gets lost or corrupted.
[
While IP handles actual delivery of the data, TCP keeps track of ''segments'' – the individual units of data transmission that a message is divided into for efficient routing through the network. For example, when an HTML file is sent from a web server, the TCP software layer of that server divides the file into segments and forwards them individually to the ]internet layer
The internet layer is a group of internetworking methods, protocols, and specifications in the Internet protocol suite that are used to transport network packets from the originating host across network boundaries; if necessary, to the desti ...
in the network stack. The internet layer software encapsulates each TCP segment into an IP packet by adding a header that includes (among other data) the destination IP address
An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
. When the client program on the destination computer receives them, the TCP software in the transport layer re-assembles the segments and ensures they are correctly ordered and error-free as it streams the file contents to the receiving application.
TCP segment structure
Transmission Control Protocol accepts data from a data stream, divides it into chunks, and adds a TCP header creating a TCP segment. The TCP segment is then encapsulated into an Internet Protocol (IP) datagram, and exchanged with peers.
The term ''TCP packet'' appears in both informal and formal usage, whereas in more precise terminology ''segment'' refers to the TCP protocol data unit
In telecommunications, a protocol data unit (PDU) is a single unit of information transmitted among peer entities of a computer network. It is composed of protocol-specific control information and user data. In the layered architectures of c ...
(PDU), ''datagram'' to the IP PDU, and ''frame'' to the data link layer
The data link layer, or layer 2, is the second layer of the seven-layer OSI model of computer networking. This layer is the protocol layer that transfers data between nodes on a network segment across the physical layer. The data link layer p ...
PDU:
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module .g. IPto transmit each segment to the destination TCP.
A TCP segment consists of a segment ''header'' and a ''data'' section. The segment header contains 10 mandatory fields, and an optional extension field (''Options'', pink background in table). The data section follows the header and is the payload data carried for the application. The length of the data section is not specified in the segment header; it can be calculated by subtracting the combined length of the segment header and IP header from the total IP datagram length specified in the IP header.
;
;
;
;
;
;
;
;
;
;
;
:Some options may only be sent when SYN is set; they are indicated below as YN/code>
. Option-Kind and standard lengths given as (Option-Kind, Option-Length).
:
:The remaining Option-Kind values are historical, obsolete, experimental, not yet standardized, or unassigned. Option number assignments are maintained by the Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, Autonomous system (Internet), autonomous system number allocation, DNS root zone, root zone management in the Domain Name Syste ...
(IANA).
;
Protocol operation
TCP protocol operations may be divided into three phases. ''Connection establishment'' is a multi-step handshake process that establishes a connection before entering the ''data transfer'' phase. After data transfer is completed, the ''connection termination'' closes the connection and releases all allocated resources.
A TCP connection is managed by an operating system through a resource that represents the local end-point for communications, the '' Internet socket''. During the lifetime of a TCP connection, the local end-point undergoes a series of state
State most commonly refers to:
* State (polity), a centralized political organization that regulates law and society within a territory
**Sovereign state, a sovereign polity in international law, commonly referred to as a country
**Nation state, a ...
changes:
Connection establishment
Before a client attempts to connect with a server, the server must first bind to and listen at a port to open it up for connections: this is called a passive open. Once the passive open is established, a client may establish a connection by initiating an active open using the three-way (or 3-step) handshake:
# SYN: The active open is performed by the client sending a SYN to the server. The client sets the segment's sequence number to a random value A.
# SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number i.e. A+1, and the sequence number that the server chooses for the packet is another random number, B.
# ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgment value i.e. A+1, and the acknowledgment number is set to one more than the received sequence number i.e. B+1.
Steps 1 and 2 establish and acknowledge the sequence number for one direction (client to server). Steps 2 and 3 establish and acknowledge the sequence number for the other direction (server to client). Following the completion of these steps, both the client and server have received acknowledgments and a full-duplex communication is established.
Connection termination
The connection termination phase uses a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After the side that sent the first FIN has responded with the final ACK, it waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this state lets the TCP client resend the final acknowledgment to the server in case the ACK is lost in transit. The time duration is implementation-dependent, but some common values are 30 seconds, 1 minute, and 2 minutes. After the timeout, the client enters the CLOSED state and the local port becomes available for new connections.
It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (combining two steps into one) and host A replies with an ACK.
Some operating systems, such as Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
implement a half-duplex close sequence. If the host actively closes a connection, while still having unread incoming data available, the host sends the signal RST (losing any received data) instead of FIN. This assures that a TCP application is aware there was a data loss.
A connection can be in a half-open state, in which case one side has terminated the connection, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well.
Resource usage
Most implementations allocate an entry in a table that maps a session to a running operating system process. Because TCP packets do not include a session identifier, both endpoints identify the session using the client's address and port. Whenever a packet is received, the TCP implementation must perform a lookup on this table to find the destination process. Each entry in the table is known as a Transmission Control Block or TCB. It contains information about the endpoints (IP and port), status of the connection, running data about the packets that are being exchanged and buffers for sending and receiving data.
The number of sessions in the server side is limited only by memory and can grow as new connections arrive, but the client must allocate an ephemeral port before sending the first SYN to the server. This port remains allocated during the whole conversation and effectively limits the number of outgoing connections from each of the client's IP addresses. If an application fails to properly close unrequired connections, a client can run out of resources and become unable to establish new TCP connections, even from other applications.
Both endpoints must also allocate space for unacknowledged packets and received (but unread) data.
Data transfer
The Transmission Control Protocol differs in several key features compared to the User Datagram Protocol
In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in Network packet, packets) to other hosts on an Internet Protoco ...
:
* Ordered data transfer: the destination host rearranges segments according to a sequence number[
* Retransmission of lost packets: any cumulative stream not acknowledged is retransmitted][
* Error-free data transfer: corrupted packets are treated as lost and are retransmitted
* Flow control: limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received. When the receiving host's buffer fills, the next acknowledgment suspends the transfer and allows the data in the buffer to be processed.][
* Congestion control: lost packets (presumed due to congestion) trigger a reduction in data delivery rate][
]
Reliable transmission
TCP uses a ''sequence number'' to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any out-of-order delivery that may occur. The sequence number of the first byte is chosen by the transmitter for the first packet, which is flagged SYN. This number can be arbitrary, and should, in fact, be unpredictable to defend against TCP sequence prediction attacks.
Acknowledgments (ACKs) are sent with a sequence number by the receiver of data to tell the sender that data has been received to the specified byte. ACKs do not imply that the data has been delivered to the application, they merely signify that it is now the receiver's responsibility to deliver the data.
Reliability is achieved by the sender detecting lost data and retransmitting it. TCP uses two primary techniques to identify loss. Retransmission timeout (RTO) and duplicate cumulative acknowledgments (DupAcks).
When a TCP segment is retransmitted, it retains the same sequence number as the original delivery attempt. This conflation of delivery and logical data ordering means that, when acknowledgment is received after a retransmission, the sender cannot tell whether the original transmission or the retransmission is being acknowledged, the so-called ''retransmission ambiguity''. TCP incurs complexity due to retransmission ambiguity.
=Duplicate-ACK-based retransmission
=
If a single segment (say segment number 100) in a stream is lost, then the receiver cannot acknowledge packets above that segment number (100) because it uses cumulative ACKs. Hence the receiver acknowledges packet 99 again on the receipt of another data packet. This duplicate acknowledgement is used as a signal for packet loss. That is, if the sender receives three duplicate acknowledgments, it retransmits the last unacknowledged packet. A threshold of three is used because the network may reorder segments causing duplicate acknowledgements. This threshold has been demonstrated to avoid spurious retransmissions due to reordering. Some TCP implementations use selective acknowledgements (SACKs) to provide explicit feedback about the segments that have been received. This greatly improves TCP's ability to retransmit the right segments.
Retransmission ambiguity can cause spurious fast retransmissions and congestion avoidance if there is reordering beyond the duplicate acknowledgment threshold. In the last two decades more packet reordering has been observed over the Internet which led TCP implementations, such as the one in the Linux Kernel to adopt heuristic methods to scale the duplicate acknowledgment threshold. Recently, there have been efforts to completely phase out duplicate-ACK-based fast-retransmissions and replace them with timer based ones. (Not to be confused with the classic RTO discussed below). The time based loss detection algorithm called Recent Acknowledgment (RACK) has been adopted as the default algorithm in Linux and Windows.
=Timeout-based retransmission
=
When a sender transmits a segment, it initializes a timer with a conservative estimate of the arrival time of the acknowledgment. The segment is retransmitted if the timer expires, with a new timeout threshold of twice the previous value, resulting in exponential backoff
Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. These algorithms find usage in a wide range of systems and processes, with radio networ ...
behavior. Typically, the initial timer value is , where is the clock granularity. This guards against excessive transmission traffic due to faulty or malicious actors, such as man-in-the-middle denial of service attackers.
Accurate RTT estimates are important for loss recovery, as it allows a sender to assume an unacknowledged packet to be lost after sufficient time elapses (i.e., determining the RTO time). Retransmission ambiguity can lead a sender's estimate of RTT to be imprecise. In an environment with variable RTTs, spurious timeouts can occur: if the RTT is under-estimated, then the RTO fires and triggers a needless retransmit and slow-start. After a spurious retransmission, when the acknowledgments for the original transmissions arrive, the sender may believe them to be acknowledging the retransmission and conclude, incorrectly, that segments sent between the original transmission and retransmission have been lost, causing further needless retransmissions to the extent that the link truly becomes congested; selective acknowledgement can reduce this effect. specifies that implementations must not use retransmitted segments when estimating RTT. Karn's algorithm ensures that a good RTT estimate will be produced—eventually—by waiting until there is an unambiguous acknowledgment before adjusting the RTO. After spurious retransmissions, however, it may take significant time before such an unambiguous acknowledgment arrives, degrading performance in the interim. TCP timestamps also resolve the retransmission ambiguity problem in setting the RTO, though they do not necessarily improve the RTT estimate.
Error detection
Sequence numbers allow receivers to discard duplicate packets and properly sequence out-of-order packets. Acknowledgments allow senders to determine when to retransmit lost packets.
To assure correctness a checksum field is included; see for details. The TCP checksum is a weak check by modern standards and is normally paired with a CRC integrity check at layer 2, below both TCP and IP, such as is used in PPP or the Ethernet
Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
frame. However, introduction of errors in packets between CRC-protected hops is common and the 16-bit TCP checksum catches most of these.
Flow control
TCP uses an end-to-end flow control protocol to avoid having the sender send data too fast for the TCP receiver to receive and process it reliably. Having a mechanism for flow control is essential in an environment where machines of diverse network speeds communicate. For example, if a PC sends data to a smartphone that is slowly processing received data, the smartphone must be able to regulate the data flow so as not to be overwhelmed.[
TCP uses a sliding window flow control protocol. In each TCP segment, the receiver specifies in the ''receive window'' field the amount of additionally received data (in bytes) that it is willing to buffer for the connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and receive window update from the receiving host.
]
When a receiver advertises a window size of 0, the sender stops sending data and starts its ''persist timer''. The persist timer is used to protect TCP from a deadlock situation that could arise if a subsequent window size update from the receiver is lost, and the sender cannot send more data until receiving a new window size update from the receiver. When the persist timer expires, the TCP sender attempts recovery by sending a small packet so that the receiver responds by sending another acknowledgment containing the new window size.
If a receiver is processing incoming data in small increments, it may repeatedly advertise a small receive window. This is referred to as the silly window syndrome, since it is inefficient to send only a few bytes of data in a TCP segment, given the relatively large overhead of the TCP header.
Congestion control
The final main aspect of TCP is congestion control. TCP uses a number of mechanisms to achieve high performance and avoid congestive collapse, a gridlock situation where network performance is severely degraded. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger collapse. They also yield an approximately max-min fair allocation between flows.
Acknowledgments for data sent, or the lack of acknowledgments, are used by senders to infer network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as congestion control or congestion avoidance.
Modern implementations of TCP contain four intertwined algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery.
In addition, senders employ a ''retransmission timeout'' (RTO) that is based on the estimated round-trip time
In telecommunications, round-trip delay (RTD) or round-trip time (RTT) is the amount of time it takes for a signal to be sent ''plus'' the amount of time it takes for acknowledgement of that signal having been received. This time delay includes p ...
(RTT) between the sender and receiver, as well as the variance in this round-trip time. There are subtleties in the estimation of RTT. For example, senders must be careful when calculating RTT samples for retransmitted packets; typically they use Karn's Algorithm or TCP timestamps. These individual RTT samples are then averaged over time to create a smoothed round trip time (SRTT) using Jacobson's algorithm. This SRTT value is what is used as the round-trip time estimate.
Enhancing TCP to reliably handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development. As a result, there are a number of TCP congestion avoidance algorithm
Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and a congestion window (CWND) ...
variations.
Maximum segment size
The maximum segment size (MSS) is the largest amount of data, specified in bytes, that TCP is willing to receive in a single segment. For best performance, the MSS should be set small enough to avoid IP fragmentation
400px, An example of the fragmentation of a protocol data unit in a given layer into smaller fragments
IP fragmentation is an Internet Protocol (IP) process that breaks packets into smaller pieces (fragments), so that the resulting pieces can p ...
, which can lead to packet loss and excessive retransmissions. To accomplish this, typically the MSS is announced by each side using the MSS option when the TCP connection is established. The option value is derived from the maximum transmission unit
In computer networking, the maximum transmission unit (MTU) is the size of the largest protocol data unit (PDU) that can be communicated in a single network layer transaction. The MTU relates to, but is not identical to the maximum frame size tha ...
(MTU) size of the data link layer of the networks to which the sender and receiver are directly attached. TCP senders can use path MTU discovery to infer the minimum MTU along the network path between the sender and receiver, and use this to dynamically adjust the MSS to avoid IP fragmentation within the network.
MSS announcement may also be called ''MSS negotiation'' but, strictly speaking, the MSS is not ''negotiated''. Two completely independent values of MSS are permitted for the two directions of data flow in a TCP connection, so there is no need to agree on a common MSS configuration for a bidirectional connection.
Selective acknowledgments
Relying purely on the cumulative acknowledgment scheme employed by the original TCP can lead to inefficiencies when packets are lost. For example, suppose bytes with sequence number 1,000 to 10,999 are sent in 10 different TCP segments of equal size, and the second segment (sequence numbers 2,000 to 2,999) is lost during transmission. In a pure cumulative acknowledgment protocol, the receiver can only send a cumulative ACK value of 2,000 (the sequence number immediately following the last sequence number of the received data) and cannot say that it received bytes 3,000 to 10,999 successfully. Thus the sender may then have to resend all data starting with sequence number 2,000.
To alleviate this issue TCP employs the ''selective acknowledgment (SACK)'' option, defined in 1996 in , which allows the receiver to acknowledge discontinuous blocks of packets that were received correctly, in addition to the sequence number immediately following the last sequence number of the last contiguous byte received successively, as in the basic TCP acknowledgment. The acknowledgment can include a number of ''SACK blocks'', where each SACK block is conveyed by the ''Left Edge of Block'' (the first sequence number of the block) and the ''Right Edge of Block'' (the sequence number immediately following the last sequence number of the block), with a ''Block'' being a contiguous range that the receiver correctly received. In the example above, the receiver would send an ACK segment with a cumulative ACK value of 2,000 and a SACK option header with sequence numbers 3,000 and 11,000. The sender would accordingly retransmit only the second segment with sequence numbers 2,000 to 2,999.
A TCP sender may interpret an out-of-order segment delivery as a lost segment. If it does so, the TCP sender will retransmit the segment previous to the out-of-order packet and slow its data delivery rate for that connection. The duplicate-SACK option, an extension to the SACK option that was defined in May 2000 in , solves this problem. Once the TCP receiver detects a second duplicate packet, it sends a D-ACK to indicate that no segments were lost, allowing the TCP sender to reinstate the higher transmission rate.
The SACK option is not mandatory and comes into operation only if both parties support it. This is negotiated when a connection is established. SACK uses a TCP header option (see for details). The use of SACK has become widespread—all popular TCP stacks support it. Selective acknowledgment is also used in Stream Control Transmission Protocol
The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the ...
(SCTP).
Selective acknowledgements can be 'reneged', where the receiver unilaterally discards the selectively acknowledged data. discouraged such behavior, but did not prohibit it to allow receivers the option of reneging if they, for example, ran out of buffer space. The possibility of reneging leads to implementation complexity for both senders and receivers, and also imposes memory costs on the sender.
Window scaling
For more efficient use of high-bandwidth networks, a larger TCP window size may be used. A 16-bit TCP window size field controls the flow of data and its value is limited to 65,535 bytes. Since the size field cannot be expanded beyond this limit, a scaling factor is used. The TCP window scale option
The TCP window scale option is an option to increase the receive window size allowed in Transmission Control Protocol above its former maximum value of 65,535 bytes. This TCP option, along with several others, is defined in which deals with lo ...
, as defined in , is an option used to increase the maximum window size to 1 gigabyte. Scaling up to these larger window sizes is necessary for TCP tuning.
The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field when interpreting it. The window scale value can be set from 0 (no shift) to 14 for each direction independently. Both sides must send the option in their SYN segments to enable window scaling in either direction.
Some routers and packet firewalls rewrite the window scaling factor during a transmission. This causes sending and receiving sides to assume different TCP window sizes. The result is non-stable traffic that may be very slow. The problem is visible on some sites behind a defective router.
TCP timestamps
TCP timestamps, defined in in 1992, can help TCP determine in which order packets were sent. TCP timestamps are not normally aligned to the system clock and start at some random value. Many operating systems will increment the timestamp for every elapsed millisecond; however, the RFC only states that the ticks should be proportional.
There are two timestamp fields:
* a 4-byte sender timestamp value (my timestamp)
* a 4-byte echo reply timestamp value (the most recent timestamp received from you).
TCP timestamps are used in an algorithm known as ''Protection Against Wrapped Sequence'' numbers, or ''PAWS''. PAWS is used when the receive window crosses the sequence number wraparound boundary. In the case where a packet was potentially retransmitted, it answers the question: "Is this sequence number in the first 4 GB or the second?" And the timestamp is used to break the tie.
Also, the Eifel detection algorithm uses TCP timestamps to determine if retransmissions are occurring because packets are lost or simply out of order.
TCP timestamps are enabled by default in Linux, and disabled by default in Windows Server 2008, 2012 and 2016.
Recent Statistics show that the level of TCP timestamp adoption has stagnated, at ~40%, owing to Windows Server dropping support since Windows Server 2008.
Out-of-band data
It is possible to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as ''urgent''. This marks the transmission as out-of-band data
In computer networking, out-of-band data is the data transferred through a stream that is independent from the main ''in-band'' data stream. An out-of-band data mechanism provides a conceptually independent channel, which allows any data sent via t ...
(OOB) and tells the receiving program to process it immediately. When finished, TCP informs the application and resumes the stream queue. An example is when TCP is used for a remote login session where the user can send a keyboard sequence that interrupts or aborts the remotely running program without waiting for the program to finish its current transfer.[
The ''urgent'' pointer only alters the processing on the remote host and doesn't expedite any processing on the network itself. The capability is implemented differently or poorly on different systems or may not be supported. Where it is available, it is prudent to assume only single bytes of OOB data will be reliably handled. Since the feature is not frequently used, it is not well tested on some platforms and has been associated with ]vulnerabilities
Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." The understanding of social and environmental vulnerability, as a methodological approach, involves ...
, WinNuke for instance.
Forcing data delivery
Normally, TCP waits for 200 ms for a full packet of data to send ( Nagle's Algorithm tries to group small messages into a single packet). This wait creates small, but potentially serious delays if repeated constantly during a file transfer. For example, a typical send block would be 4 KB, a typical MSS is 1460, so 2 packets go out on a 10 Mbit/s Ethernet taking ~1.2 ms each followed by a third carrying the remaining 1176 after a 197 ms pause because TCP is waiting for a full buffer. In the case of telnet, each user keystroke is echoed back by the server before the user can see it on the screen. This delay would become very annoying.
Setting the socket
Socket may refer to:
Mechanics
* Socket wrench, a type of wrench that uses separate, removable sockets to fit different sizes of nuts and bolts
* Socket head screw, a screw (or bolt) with a cylindrical head containing a socket into which the hexag ...
option TCP_NODELAY
overrides the default 200 ms send delay. Application programs use this socket option to force output to be sent after writing a character or line of characters.
The defines the PSH
push bit as "a message to the receiving TCP stack to send this data immediately up to the receiving application".[ There is no way to indicate or control it in ]user space
A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware prote ...
using Berkeley sockets
A Berkeley ( BSD) socket is an application programming interface (API) for Internet domain sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated wi ...
; it is controlled by the protocol stack
The protocol stack or network stack is an implementation of a computer networking protocol suite or protocol family. Some of these terms are used interchangeably but strictly speaking, the ''suite'' is the definition of the communication protoc ...
only.
Vulnerabilities
TCP may be attacked in a variety of ways. The results of a thorough security assessment of TCP, along with possible mitigations for the identified issues, were published in 2009, and was pursued within the IETF
The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
through 2012. Notable vulnerabilities include denial of service, connection hijacking, TCP veto and TCP reset attack
A TCP reset attack, also known as a forged TCP reset or spoofed TCP reset, is a way to terminate a TCP connection by sending a forged TCP reset packet. This tampering technique can be used by a firewall or abused by a malicious attacker to interru ...
.
Denial of service
By using a spoofed IP address and repeatedly sending purposely assembled SYN packets, followed by many ACK packets, attackers can cause the server to consume large amounts of resources keeping track of the bogus connections. This is known as a SYN flood
A SYN flood is a form of denial-of-service attack on data communications in which an attacker rapidly initiates a connection to a server without finalizing the connection. The server has to spend resources waiting for half-opened connections, wh ...
attack. Proposed solutions to this problem include SYN cookies and cryptographic puzzles, though SYN cookies come with their own set of vulnerabilities. Sockstress is a similar attack, that might be mitigated with system resource management. An advanced DoS attack involving the exploitation of the TCP ''persist timer'' was analyzed in Phrack
''Phrack'' is an e-zine written by and for Hacker (computer security), hackers, first published November 17, 1985. It had a wide circulation which included both hackers and computer security professionals.
Originally covering subjects related to ...
No. 66. PUSH and ACK floods are other variants.
Connection hijacking
An attacker who is able to eavesdrop on a TCP session and redirect packets can hijack a TCP connection. To do so, the attacker learns the sequence number from the ongoing communication and forges a false segment that looks like the next segment in the stream. A simple hijack can result in one packet being erroneously accepted at one end. When the receiving host acknowledges the false segment, synchronization is lost. Hijacking may be combined with ARP spoofing
In computer networking, ARP spoofing (also ARP cache poisoning or ARP poison routing) is a technique by which an attacker sends ( spoofed) Address Resolution Protocol (ARP) messages onto a local area network. Generally, the aim is to associate ...
or other routing attacks that allow an attacker to take permanent control of the TCP connection.
Impersonating a different IP address was not difficult prior to when the initial ''sequence number'' was easily guessable. The earlier implementations allowed an attacker to blindly send a sequence of packets that the receiver would believe came from a different IP address, without the need to intercept communication through ARP or routing attacks: it is enough to ensure that the legitimate host of the impersonated IP address is down, or bring it to that condition using denial-of-service attack
In computing, a denial-of-service attack (DoS attack) is a cyberattack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host co ...
s. This is why the initial sequence number is now chosen at random.
TCP veto
An attacker who can eavesdrop and predict the size of the next packet to be sent can cause the receiver to accept a malicious payload without disrupting the existing connection. The attacker injects a malicious packet with the sequence number and a payload size of the next expected packet. When the legitimate packet is ultimately received, it is found to have the same sequence number and length as a packet already received and is silently dropped as a normal duplicate packet—the legitimate packet is ''vetoed'' by the malicious packet. Unlike in connection hijacking, the connection is never desynchronized and communication continues as normal after the malicious payload is accepted. TCP veto gives the attacker less control over the communication but makes the attack particularly resistant to detection. The only evidence to the receiver that something is amiss is a single duplicate packet, a normal occurrence in an IP network. The sender of the vetoed packet never sees any evidence of an attack.
TCP ports
A TCP connection is identified by a four-tuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
of the source address, source port
A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Hamburg, Manch ...
, destination address, and destination port. Port numbers are used to identify different services, and to allow multiple connections between hosts. TCP uses 16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
port numbers, providing 65,536 possible values for each of the source and destination ports. The dependency of connection identity on addresses means that TCP connections are bound to a single network path; TCP cannot use other routes that multihomed hosts have available, and connections break if an endpoint's address changes.
Port numbers are categorized into three basic categories: well-known, registered, and dynamic or private. The well-known ports are assigned by the Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, Autonomous system (Internet), autonomous system number allocation, DNS root zone, root zone management in the Domain Name Syste ...
(IANA) and are typically used by system-level processes. Well-known applications running as servers and passively listening for connections typically use these ports. Some examples include: FTP
The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and dat ...
(20 and 21), SSH (22), TELNET
Telnet (sometimes stylized TELNET) is a client-server application protocol that provides access to virtual terminals of remote systems on local area networks or the Internet. It is a protocol for bidirectional 8-bit communications. Its main ...
(23), SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
(25), HTTP over SSL/TLS (443), and HTTP
HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
(80). Registered ports are typically used by end-user applications as ephemeral
Ephemerality (from the Greek word , meaning 'lasting only one day') is the concept of things being transitory, existing only briefly. Academically, the term ephemeral constitutionally describes a diverse assortment of things and experiences, fr ...
source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic or private ports can also be used by end-user applications, however, these ports typically do not contain any meaning outside a particular TCP connection.
Network Address Translation
Network address translation (NAT) is a method of mapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic Router (computing), routing device. The te ...
(NAT), typically uses dynamic port numbers, on the public-facing side, to disambiguate the flow of traffic that is passing between a public network and a private subnetwork
A subnet, or subnetwork, is a logical subdivision of an IP network. Updated by RFC 6918. The practice of dividing a network into two or more networks is called subnetting.
Computers that belong to the same subnet are addressed with an identic ...
, thereby allowing many IP addresses (and their ports) on the subnet to be serviced by a single public-facing address.
Development
TCP is a complex protocol. However, while significant enhancements have been made and proposed over the years, its most basic operation has not changed significantly since its first specification in 1974, and the v4 specification , published in September 1981. , published in October 1989, clarified a number of TCP protocol implementation requirements. A list of the 8 required specifications and over 20 strongly encouraged enhancements is available in . Among this list is , TCP Congestion Control, one of the most important TCP-related RFCs in recent years, describes updated algorithms that avoid undue congestion. In 2001, was written to describe Explicit Congestion Notification (ECN), a congestion avoidance signaling mechanism.
The original TCP congestion avoidance algorithm was known as ''TCP Tahoe'', but many alternative algorithms have since been proposed (including TCP Reno, TCP Vegas, FAST TCP, TCP New Reno, and TCP Hybla).
Multipath TCP (MPTCP) is an ongoing effort within the IETF that aims at allowing a TCP connection to use multiple paths to maximize resource usage and increase redundancy. The redundancy offered by Multipath TCP in the context of wireless networks enables the simultaneous use of different networks, which brings higher throughput and better handover capabilities. Multipath TCP also brings performance benefits in datacenter environments. The reference implementation of Multipath TCP was developed in the Linux kernel. Multipath TCP is used to support the Siri voice recognition application on iPhones, iPads and Macs.
tcpcrypt is an extension proposed in July 2010 to provide transport-level encryption directly in TCP itself. It is designed to work transparently and not require any configuration. Unlike TLS (SSL), tcpcrypt itself does not provide authentication, but provides simple primitives down to the application to do that. The tcpcrypt RFC was published by the IETF in May 2019.
TCP Fast Open is an extension to speed up the opening of successive TCP connections between two endpoints. It works by skipping the three-way handshake using a cryptographic ''cookie''. It is similar to an earlier proposal called T/TCP, which was not widely adopted due to security issues. TCP Fast Open was published as in 2014.
Proposed in May 2013, Proportional Rate Reduction (PRR) is a TCP extension developed by Google engineers. PRR ensures that the TCP window size after recovery is as close to the slow start threshold as possible. The algorithm is designed to improve the speed of recovery and is the default congestion control algorithm in Linux 3.2+ kernels.
Deprecated proposals
TCP Cookie Transactions (TCPCT) is an extension proposed in December 2009 to secure servers against denial-of-service attacks. Unlike SYN cookies, TCPCT does not conflict with other TCP extensions such as window scaling. TCPCT was designed due to necessities of DNSSEC
The Domain Name System Security Extensions (DNSSEC) is a suite of extension specifications by the Internet Engineering Task Force (IETF) for securing data exchanged in the Domain Name System ( DNS) in Internet Protocol ( IP) networks. The protoco ...
, where servers have to handle large numbers of short-lived TCP connections. In 2016, TCPCT was deprecated in favor of TCP Fast Open. The status of the original RFC was changed to ''historic''.
Hardware implementations
One way to overcome the processing power requirements of TCP is to build hardware implementations of it, widely known as TCP offload engine
TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and ...
s (TOE). The main problem of TOEs is that they are hard to integrate into computing systems, requiring extensive changes in the operating system of the computer or device.
Wire image and ossification
The wire data of TCP provides significant information-gathering and modification opportunities to on-path observers, as the protocol metadata is transmitted in cleartext. While this transparency is useful to network operators and researchers, information gathered from protocol metadata may reduce the end-user's privacy. This visibility and malleability of metadata has led to TCP being difficult to extend—a case of protocol ossification
Protocol ossification is the loss of flexibility, extensibility and evolvability of network protocols. This is largely due to middleboxes that are sensitive to the wire image (networking), wire image of the protocol, and which can interrupt or int ...
—as any intermediate node (a ' middlebox') can make decisions based on that metadata or even modify it, breaking the end-to-end principle
The end-to-end principle is a design principle in computer networking that requires application-specific features (such as reliability and security) to be implemented in the communicating end nodes of the network, instead of in the network itse ...
. One measurement found that a third of paths across the Internet encounter at least one intermediary that modifies TCP metadata, and 6.5% of paths encounter harmful ossifying effects from intermediaries. Avoiding extensibility hazards from intermediaries placed significant constraints on the design of MPTCP, and difficulties caused by intermediaries have hindered the deployment of TCP Fast Open in web browsers
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scree ...
. Another source of ossification is the difficulty of modification of TCP functions at the endpoints, typically in the operating system kernel or in hardware with a TCP offload engine
TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and ...
.
Performance
As TCP provides applications with the abstraction of a reliable byte stream
A reliable byte stream is a common service paradigm in computer networking; it refers to a byte stream in which the bytes which emerge from the communication channel at the recipient are exactly the same, and in exactly the same order, as they we ...
, it can suffer from head-of-line blocking
Head-of-line blocking (HOL blocking) in computer networking is a performance-limiting phenomenon that occurs when a queue of packets is held up by the first packet in the queue. This occurs, for example, in input-buffered network switches, out-o ...
: if packets are reordered or lost and need to be retransmitted (and thus are reordered), data from sequentially later parts of the stream may be received before sequentially earlier parts of the stream; however, the later data cannot typically be used until the earlier data has been received, incurring network latency
Network delay is a design and performance characteristic of a telecommunications network. It specifies the latency for a bit of data to travel across the network from one communication endpoint to another. It is typically measured in multiples ...
. If multiple independent higher-level messages are encapsulated and multiplexed
In telecommunications and computer networking, multiplexing (sometimes contracted to muxing) is a method by which multiple analog or digital signals are combined into one signal over a shared medium. The aim is to share a scarce resource— ...
onto a single TCP connection, then head-of-line blocking can cause processing of a fully-received message that was sent later to wait for delivery of a message that was sent earlier. Web browsers
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scree ...
attempt to mitigate head-of-line blocking by opening multiple parallel connections. This incurs the cost of connection establishment repeatedly, as well as multiplying the resources needed to track those connections at the endpoints. Parallel connections also have congestion control operating independently of each other, rather than being able to pool information together and respond more promptly to observed network conditions; TCP's aggressive initial sending patterns can cause congestion if multiple parallel connections are opened; and the per-connection fairness model leads to a monopolization of resources by applications that take this approach.
Connection establishment is a major contributor to latency as experienced by web users. TCP's three-way handshake introduces one RTT of latency during connection establishment before data can be sent. For short flows, these delays are very significant. Transport Layer Security
Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network, such as the Internet. The protocol is widely used in applications such as email, instant messaging, and voice over ...
(TLS) requires a handshake of its own for key exchange at connection establishment. Because of the layered design, the TCP handshake and the TLS handshake proceed serially; the TLS handshake cannot begin until the TCP handshake has concluded. Two RTTs are required for connection establishment with TLS 1.2 over TCP. TLS 1.3 allows for zero RTT connection resumption in some circumstances, but, when layered over TCP, one RTT is still required for the TCP handshake, and this cannot assist the initial connection; zero RTT handshakes also present cryptographic challenges, as efficient, replay-safe and forward secure non-interactive key exchange is an open research topic. TCP Fast Open allows the transmission of data in the initial (i.e., SYN and SYN-ACK) packets, removing one RTT of latency during connection establishment. However, TCP Fast Open has been difficult to deploy due to protocol ossification; , no Web browser
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
s used it by default.
TCP throughput is affected by packet reordering. Reordered packets can cause duplicate acknowledgments to be sent, which, if they cross a threshold, will then trigger a spurious retransmission and congestion control. Transmission behavior can also become bursty, as large ranges are acknowledged all at once when a reordered packet at the range's start is received (in a manner similar to how head-of-line blocking affects applications). found that throughput was inversely related to the amount of reordering, up to a threshold where all reordering triggers spurious retransmission. Mitigating reordering depends on a sender's ability to determine that it has sent a spurious retransmission, and hence on resolving retransmission ambiguity. Reducing reordering-induced spurious retransmissions may slow recovery from genuine loss.
Selective acknowledgment can provide a significant benefit to throughput; measured gains of up to 45%. An important factor in the improvement is that selective acknowledgment can more often avoid going into slow start after a loss and can hence better use available bandwidth. However, TCP can only selectively acknowledge a maximum of three blocks of sequence numbers. This can limit the retransmission rate and hence loss recovery or cause needless retransmissions, especially in high-loss environments.
TCP was originally designed for wired networks where packet loss is considered to be the result of network congestion
Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of ...
and the congestion window size is reduced dramatically as a precaution. However, wireless links are known to experience sporadic and usually temporary losses due to fading
In wireless communications, fading is the variation of signal attenuation over variables like time, geographical position, and radio frequency. Fading is often modeled as a random process. In wireless systems, fading may either be due to mul ...
, shadowing, hand off, interference
Interference is the act of interfering, invading, or poaching. Interference may also refer to:
Communications
* Interference (communication), anything which alters, modifies, or disrupts a message
* Adjacent-channel interference, caused by extra ...
, and other radio effects, that are not strictly congestion. After the (erroneous) back-off of the congestion window size, due to wireless packet loss, there may be a congestion avoidance phase with a conservative decrease in window size. This causes the radio link to be underused. Extensive research on combating these harmful effects has been conducted. Suggested solutions can be categorized as end-to-end solutions, which require modifications at the client or server, link layer solutions, such as Radio Link Protocol in cellular networks, or proxy-based solutions which require some changes in the network without modifying end nodes. A number of alternative congestion control algorithms, such as Vegas, Westwood, Veno, and Santa Cruz, have been proposed to help solve the wireless problem.
Acceleration
The idea of a TCP accelerator is to terminate TCP connections inside the network processor and then relay the data to a second connection toward the end system. The data packets that originate from the sender are buffered at the accelerator node, which is responsible for performing local retransmissions in the event of packet loss. Thus, in case of losses, the feedback loop between the sender and the receiver is shortened to the one between the acceleration node and the receiver which guarantees a faster delivery of data to the receiver.
Since TCP is a rate-adaptive protocol, the rate at which the TCP sender injects packets into the network is directly proportional to the prevailing load condition within the network as well as the processing capacity of the receiver. The prevalent conditions within the network are judged by the sender on the basis of the acknowledgments received by it. The acceleration node splits the feedback loop between the sender and the receiver and thus guarantees a shorter round trip time (RTT) per packet. A shorter RTT is beneficial as it ensures a quicker response time to any changes in the network and a faster adaptation by the sender to combat these changes.
Disadvantages of the method include the fact that the TCP session has to be directed through the accelerator; this means that if routing changes so that the accelerator is no longer in the path, the connection will be broken. It also destroys the end-to-end property of the TCP ACK mechanism; when the ACK is received by the sender, the packet has been stored by the accelerator, not delivered to the receiver.
Debugging
A packet sniffer
A packet analyzer (also packet sniffer or network analyzer) is a computer program or computer hardware such as a packet capture appliance that can Traffic analysis, analyze and Logging (computing), log traffic that passes over a computer netwo ...
, which taps TCP traffic on a network link, can be useful in debugging networks, network stacks, and applications that use TCP by showing an engineer what packets are passing through a link. Some networking stacks support the SO_DEBUG socket option, which can be enabled on the socket using setsockopt. That option dumps all the packets, TCP states, and events on that socket, which is helpful in debugging. Netstat
In computing, netstat is a command-line network utility that displays open network sockets, routing tables, and a number of network interface (network interface controller or software-defined network interface) and network protocol statistic ...
is another utility that can be used for debugging.
Alternatives
For many applications TCP is not appropriate. The application cannot normally access the packets coming after a lost packet until the retransmitted copy of the lost packet is received. This causes problems for real-time applications such as streaming media, real-time multiplayer games and voice over IP
Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...
(VoIP) where it is generally more useful to get most of the data in a timely fashion than it is to get all of the data in order.
For historical and performance reasons, most storage area network
A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block device, block-level data storage. SANs are primarily used to access Computer data storage, data storage devices, such as disk ...
s (SANs) use Fibre Channel Protocol (FCP) over Fibre Channel
Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to Server (computing), servers in storage area networks (SAN) in ...
connections. For embedded system
An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s, network booting
Network booting, shortened netboot, is the process of booting a computer from a computer network, network rather than a local drive. This method of booting can be used by Router (computing), routers, diskless workstations and centrally managed c ...
, and servers that serve simple requests from huge numbers of clients (e.g. DNS servers) the complexity of TCP can be a problem. Tricks such as transmitting data between two hosts that are both behind NAT (using STUN or similar systems) are far simpler without a relatively complex protocol like TCP in the way.
Generally, where TCP is unsuitable, the User Datagram Protocol
In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in Network packet, packets) to other hosts on an Internet Protoco ...
(UDP) is used. This provides the same application multiplexing
In telecommunications and computer networking, multiplexing (sometimes contracted to muxing) is a method by which multiple analog or digital signals are combined into one signal over a shared medium. The aim is to share a scarce resource� ...
and checksums that TCP does, but does not handle streams or retransmission, giving the application developer the ability to code them in a way suitable for the situation, or to replace them with other methods such as forward error correction
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.
The centra ...
or error concealment Error concealment is a technique used in signal processing that aims to minimize the deterioration of signals caused by missing data, called packet loss. A signal is a message sent from a transmitter to a Receiver (radio), receiver in multiple small ...
.
Stream Control Transmission Protocol
The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the ...
(SCTP) is another protocol that provides reliable stream-oriented services similar to TCP. It is newer and considerably more complex than TCP, and has not yet seen widespread deployment. However, it is especially designed to be used in situations where reliability and near-real-time considerations are important.
Venturi Transport Protocol (VTP) is a patented proprietary protocol
In telecommunications, a proprietary protocol is a communications protocol owned by a single organization or individual.
Intellectual property rights and enforcement
Ownership by a single organization gives the owner the ability to place restricti ...
that is designed to replace TCP transparently to overcome perceived inefficiencies related to wireless data transport.
The TCP congestion avoidance algorithm
Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and a congestion window (CWND) ...
works very well for ad-hoc environments where the data sender is not known in advance. If the environment is predictable, a timing-based protocol such as Asynchronous Transfer Mode
Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by the American National Standards Institute and International Telecommunication Union Telecommunication Standardization Sector (ITU-T, formerly CCITT) for digital trans ...
(ATM) can avoid TCP's retransmission overhead.
UDP-based Data Transfer Protocol (UDT) has better efficiency and fairness than TCP in networks that have high bandwidth-delay product
In data communications, the bandwidth-delay product is the product of a data link's capacity (in bits per second) and its round-trip delay time (in seconds). The result, an amount of data measured in bits (or bytes), is equivalent to the maximu ...
.
Multipurpose Transaction Protocol (MTP/IP) is patented proprietary software that is designed to adaptively achieve high throughput and transaction performance in a wide variety of network conditions, particularly those where TCP is perceived to be inefficient.
Checksum computation
TCP checksum for IPv4
When TCP runs over IPv4
Internet Protocol version 4 (IPv4) is the first version of the Internet Protocol (IP) as a standalone specification. It is one of the core protocols of standards-based internetworking methods in the Internet and other packet-switched networks. ...
, the method used to compute the checksum is defined as follows:
''The checksum field is the 16-bit ones' complement of the ones' complement sum of all 16-bit words in the header and text. The checksum computation needs to ensure the 16-bit alignment of the data being summed. If a segment contains an odd number of header and text octets, alignment can be achieved by padding the last octet with zeros on its right to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros.''
In other words, after appropriate padding, all 16-bit words are added using ones' complement arithmetic. The sum is then bitwise complemented and inserted as the checksum field. A pseudo-header that mimics the IPv4 packet header used in the checksum computation is as follows:
The checksum is computed over the following fields:
;
;
;
;
;
TCP checksum for IPv6
When TCP runs over IPv6
Internet Protocol version 6 (IPv6) is the most recent version of the Internet Protocol (IP), the communication protocol, communications protocol that provides an identification and location system for computers on networks and routes traffic ...
, the method used to compute the checksum is changed:
''Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses.''
A pseudo-header that mimics the IPv6 header for computation of the checksum is shown below.
The checksum is computed over the following fields:
;
;
;
;
;
Checksum offload
Many TCP/IP software stack implementations provide options to use hardware assistance to automatically compute the checksum in the network adapter
A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter and physical network interface) is a computer hardware component that connects a computer to a computer network.
Early network interface ...
prior to transmission onto the network or upon reception from the network for validation. This may reduce CPU load associated with calculating the checksum, potentially increasing overall network performance.
This feature may cause packet analyzer
A packet analyzer (also packet sniffer or network analyzer) is a computer program or computer hardware such as a packet capture appliance that can analyze and log traffic that passes over a computer network or part of a network. Packet capt ...
s that are unaware or uncertain about the use of checksum offload to report invalid checksums in outbound packets that have not yet reached the network adapter. This will only occur for packets that are intercepted before being transmitted by the network adapter; all packets transmitted by the network adaptor on the wire will have valid checksums. This issue can also occur when monitoring packets being transmitted between virtual machines on the same host, where a virtual device driver may omit the checksum calculation (as an optimization), knowing that the checksum will be calculated later by the VM host kernel or its physical hardware.
See also
* Fault-tolerant messaging
* Micro-bursting (networking)
* TCP global synchronization
TCP global synchronization in computer networks is a pattern of each sender decreasing and increasing transmission rates at the same time as other senders. It can happen to Transmission Control Protocol (TCP) flows during periods of congestion b ...
* TCP fusion
* TCP pacing
* TCP Stealth
*
* WTCP a proxy-based modification of TCP for wireless networks
Notes
References
Bibliography
Requests for Comments
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Other documents
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Further reading
*
*
***
External links
{{Commons, TCP, Transmission Control Protocol
Oral history interview with Robert E. Kahn
IANA Port Assignments
IANA TCP Parameters
* ttp://mathforum.org/library/drmath/view/54379.html Checksum example
Computer-related introductions in 1974
Transport layer protocols