Transmit Packet Steering
   HOME

TheInfoList



OR:

Network packet In telecommunications and computer networking, a network packet is a formatted unit of Data (computing), data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the ''Payload ...
steering of transmitted and received traffic for multi-core architectures is needed in modern network computing environment, especially in
data centers A data center is a building, a dedicated space within a building, or a group of buildings used to house computer, computer systems and associated components, such as telecommunications and computer data storage, storage systems. Since IT opera ...
, where the high bandwidth and heavy loads would easily congestion a single core's queue. For this reason many techniques, both in hardware and in software, are leveraged in order to distribute the incoming load of packets across the cores of the processor. On the traffic-receiving side, the most notable techniques presented in this article are: RSS, aRFS, RPS and RFS. For transmission, we will focus on XPS.
As shown by the figure beside, packets coming into the network interface card (NIC) are processed and loaded to the receiving queues managed by the cores (which are usually implemented as ring buffers within the
kernel space A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware prote ...
). The main objective is being able to leverage all the cores available within the
CPU A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...
to process incoming packets, while also improving performances like latency and
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel in a communication network, such as Ethernet or packet radio. The data that these messages contain may be delivered ov ...
.


Hardware techniques

Hardware accelerated techniques like RSS and aRFS are used to route and load balance incoming packets across the multiple cores' queues of a processor.
Those hardware supported methods achieve extremely low latencies and reduce the load on the CPU, as compared to the software based ones. However they require a specialized hardware integrated within the
network interface controller A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter and physical network interface) is a computer hardware component that connects a computer to a computer network. Early network interface ...
(which, for example, is usually available on more advanced cards, like the SmartNIC).


RSS

Receive Side Scaling (RSS) is a hardware supported technique, leveraging an indirection table indexed by the last bits of the result provided by a
hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
, taking as inputs the header fields of the packets. The hash function input is usually customizable and the header fields used can vary between use case and implementations. Some notable examples of header fields chosen as keys for the hash are the layer 3 IP source and destination addresses, the protocol and the
layer 4 In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end c ...
source and destination ports. In this way, packets corresponding to the same flow will be directed to the same receiving queue, without loosing the original order, causing an
out-of-order delivery In computer networking, out-of-order delivery is the delivery of data packets in a different order from which they were sent. Out-of-order delivery can be caused by packets following multiple paths through a network, by lower-layer retransmissi ...
. Moreover all incoming flows will be load balanced across all the available cores thanks to the hash function properties.
Another important feature introduced by the indirection table is the capability of changing the mapping of flows to the cores without having to change the hash function, but by simply updating the table entries.


aRFS

Accelerated Receive Flow Steering (aRFS) is another hardware supported technique, born with the idea of leveraging
cache locality In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
to improve performances by routing incoming packet flows to specific cores. Differently from RSS which is a fully independent hardware implementation, aRFS needs to interface with the software (the
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
) to properly function.
RSS simply load balance incoming traffic across the cores; however if a packet flow is directed to the ''core i'' (as a result of the hash function) while the application needing the received packet is running on ''core j'', many cache misses could be avoided by simply forcing ''i=j'', so that packets are received exactly where they are needed and consumed.
To do this aRFS doesn't forward packets directly from the result of the hash function, but using a configurable routing table (which can be filled and updated for instance by the
scheduler A schedule (, ) or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things ...
through an
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
) packet flows can be steered to the specific consuming core.


Software techniques

Software techniques like RPS and RFS employ one of the CPU cores to steer incoming packets across the other cores of the processor. This comes at the cost of introducing additional inter-processor interrupts (IPIs); however the number of hardware interrupts will not increase and potentially, by employing an interrupt aggregation technique, it could even be reduced.
The benefits of a software solutions is the ease in implementation, without having to change any component (like the
NIC Nic is a gender-neutral given name, often short for Nicole, Nicholas, Nicola, or Dominic. It is also a component of Irish-language female surnames. It may refer to: Arts and entertainment * Nic Dalton (born 1964), Australian musician * Nic En ...
) of the currently used architecture, but by simply deploying the proper
kernel module A loadable kernel module (LKM) is an executable library that extends the capabilities of a running kernel, or so-called ''base kernel'', of an operating system. LKMs are typically used to add support for new hardware (as device drivers) and/or ...
. This benefit can be crucial especially in cases where the server machine can't be customized or accessed (like in
cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
environment), even if the network performances could be reduced as compared the hardware supported ones.


RPS

Receive Packet Steering (RPS) is the RSS parallel implemented in software. All packets received by the NIC are load balanced between the cores' queues by implementing an hash function using as configurable key the header fields (like the layer 3 source and destination IP and layer 4 source and destination ports), in the same fashion as RSS does. Moreover thanks to the hash properties, packets belonging to the same flow will always be steered to the same core.
This is usually done in the kernel, right after the NIC driver. Having handled the network interrupt and before it can be processed, the packet is sent to the receiving queue of a core, which is then notified thanks to an inter process interrupt.
RPS can be used in conjunction with RSS, in case the number of queues managed by the hardware is lower than the number of cores. In this case after having distributed across the RSS queues the incoming packets, a pool of cores can be assigned to each queue and RPS will be used to spread again the incoming flows across the specified pool.


RFS

Receive Flow Steering (RFS) upgrades RPS in the same direction as the aRFS hardware solution does. By routing packet flows to the same CPU core running the consuming application, cache locality can be improved and leveraged, avoiding many misses and reducing the latencies introduced by the retrieval of the data from the central memory.
To do this, after having computed the hash of the header fields for the current packet, the result is used to index a lookup table. This table is managed by the scheduler, which updates its entries when the application processes are moved between the cores.
The overall CPU load distribution is balanced as long as the applications in
user-space A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware protec ...
are evenly distributed across the multiple cores.


XPS (in transmission)

Transmit Packet Steering (XPS) is a transmission protocol, as opposed to the others that have been mentioned so far. When packets need to be loaded on one of the transmission queues exposed by the NIC, there are again many possible optimization that could be done.
For instance if multiple transmission queues are assigned to a single core, an hash function could be used to load balance outgoing packets across the queues (similarly to how RPS does in reception). Moreover in order to improve cache locality and hit-rate (similarly to how RFS does), XPS ensures that applications producing the outgoing traffic and running in ''core i'' will favor the transmitting queues associated with the same ''core i''. This reduces the inter-core communication and cache coherency protocols overheads, resulting in better performances in heavy load environments.


See also

*
Cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
* Load balancing * Multi-core architectures * Network packets *
NIC Nic is a gender-neutral given name, often short for Nicole, Nicholas, Nicola, or Dominic. It is also a component of Irish-language female surnames. It may refer to: Arts and entertainment * Nic Dalton (born 1964), Australian musician * Nic En ...
*
Packet processing In digital communications networks, packet processing refers to the wide variety of algorithms that are applied to a packet of data or information as it moves through the various network elements of a communications network. With the increased perf ...
* SmartNIC


References


Further readings

* * * *


External links

* * *
Packet Steering for Multicore Virtual Network Applications over DPDK
{{Basic computer components, state=collapsed Networking hardware Network flow problem Manycore processors Load balancing (computing) Cache (computing)