NVLink
   HOME

TheInfoList



OR:

NVLink is a wire-based serial multi-lane near-range
communications Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inquir ...
link developed by
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
. Unlike
PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common ...
, a device can consist of multiple NVLinks, and devices use
mesh networking A mesh network is a local area network topology in which the infrastructure nodes (i.e. bridges, switches, and other infrastructure devices) connect directly, dynamically and non-hierarchically to as many other nodes as possible and cooperate wit ...
to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).


Principle

NVLink is a wire-based
communications protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synch ...
for near-range semiconductor communications developed by
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
that can be used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20, 25 and 50 Gbit/s (v1.0/v2.0/v3.0 resp.) per differential pair. Eight differential pairs form a "sub-link" and two "sub-links", one for each direction, form a "link". The total data rate for a sub-link is 25 GByte/s and the total data rate for a link is 50 GByte/s. Each V100 GPU supports up to six links. Thus, each GPU is capable of supporting up to 300 GByte/s in total bi-directional bandwidth. NVLink products introduced to date focus on the high-performance application space. Announced May 14, 2020, NVLink 3.0 increases the data rate per differential pair from 25 Gbit/s to 50 Gbit/s while halving the number of pairs per NVLink from 8 to 4. With 12 links for an
Ampere The ampere (, ; symbol: A), often shortened to amp,SI supports only the use of symbols and deprecates the use of abbreviations for units. is the unit of electric current in the International System of Units (SI). One ampere is equal to elect ...
-based A100 GPU this brings the total bandwidth to 600 GB/sec. Hopper has 18 NVLink 4.0 links enabling a total of 900 GB/sec bandwidth.


Performance

The following table shows a basic metrics comparison based upon standard specifications: The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options: Note: Data rate columns were rounded by being approximated by transmission rate, see real world performance paragraph
:Ⓐ: sample value; NVLink sub-link bundling should be possible :Ⓑ: sample value; other fractions for the PCIe lane usage should be possible :Ⓒ: a single (no! 16) PCIe lane transfers data over a differential pair :Ⓓ: various limitations of finally possible combinations might apply due to chip pin muxing and board design :dual: interface unit can either be configured as a root hub or an end point :generic: bare semiconductor without any board design specific restrictions applied Real world performance could be determined by applying different encapsulation taxes as well usage rate. Those come from various sources: * 128b/130b
line code In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained ...
(see e.g. PCI Express data transmission for versions 3.0 and higher) * Link control characters * Transaction header * Buffering capabilities (depends on device) *
DMA DMA may refer to: Arts * DMA (magazine), ''DMA'' (magazine), a defunct dance music magazine * Dallas Museum of Art, an art museum in Texas, US * Danish Music Awards, an award show held in Denmark * BT Digital Music Awards, an annual event in the U ...
usage on computer side (depends on other software, usually negligible on benchmarks) Those physical limitations usually reduce the data rate to between 90 and 95% of the transfer rate. NVLink benchmarks show an achievable transfer rate of about 35.3 Gbit/s (host to device) for a 40 Gbit/s (2 sub-lanes uplink) NVLink connection towards a P100 GPU in a system that is driven by a set of IBM Power8 CPUs.


Usage with plug-in boards

For the various versions of plug-in boards (a yet small number of high-end gaming and professional graphics GPU boards with this feature exist) that expose extra connectors for joining them into a NVLink group, a similar number of slightly varying, relatively compact, PCB based interconnection plugs does exist. Typically only boards of the same type will mate together due to their physical and logical design. For some setups two identical plugs need to be applied for achieving the full data rate. As of now the typical plug is U-shaped with a fine grid edge connector on each of the end strokes of the shape facing away from the viewer. The width of the plug determines how far away the plug-in cards need to be seated to the main board of the hosting computer system - a distance for the placement of the card is commonly determined by the matching plug (known available plug widths are 3 to 5 slots and also depend on board type). The interconnect is often referred as Scalable Link Interface (SLI) from 2004 for its structural design and appearance, even if the modern NVLink based design is of a quite different technical nature with different features in its basic levels compared to the former design. Reported real world devices are: * Quadro GP100 (a pair of cards will make use of up to 2 bridges; the setup realizes either 2 or 4 NVLink connections with up to 160 GB/s - this might resemble NVLink 1.0 with 20 GT/s) * Quadro GV100 (a pair of cards will need up to 2 bridges and realize up to 200 GB/s - this might resemble NVLink 2.0 with 25 GT/s and 4 links) * GeForce RTX 2080 based on TU104 (with single bridge "GeForce RTX NVLink-Bridge") * GeForce RTX 2080 Ti based on TU102 (with single bridge "GeForce RTX NVLink-Bridge") * Quadro RTX 5000 based on TU104 (with single bridge "NVLink" up to 50 GB/s - this might resemble NVLink 2.0 with 25 GT/s and 1 link) * Quadro RTX 6000 based on TU102 (with single bridge "NVLink HB" up to 100 GB/s - this might resemble NVLink 2.0 with 25 GT/s and 2 links) * Quadro RTX 8000 based on TU102 (with single bridge "NVLink HB" up to 100 GB/s - this might resemble NVLink 2.0 with 25 GT/s and 2 links)


Service software and programming

For the Tesla, Quadro and Grid product lines, the NVML-API (Nvidia Management Library API) offers a set of functions for programmatically controlling some aspects of NVLink interconnects on Windows and Linux systems, such as component evaluation and versions along with status/error querying and performance monitoring. Further, with the provision of the NCCL library (Nvidia Collective Communications Library) developers in the public space shall be enabled for realizing e.g. powerful implementations for artificial intelligence and similar computation hungry topics atop NVLink. The page "3D Settings" » "Configure SLI, Surround, PhysX" in the Nvidia Control panel and the
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
sample application "simpleP2P" use such APIs to realize their services in respect to their NVLink features. On the Linux platform, the command line application with sub-command "nvidia-smi nvlink" provides a similar set of advanced information and control.


History

On 5 April 2016, Nvidia announced that NVLink would be implemented in the Pascal-microarchitecture-based GP100 GPU, as used in, for example, Nvidia Tesla P100 products. With the introduction of the DGX-1 high performance computer base it was possible to have up to eight P100 modules in a single rack system connected to up to two host CPUs. ''The carrier board (...) allows for a dedicated board for routing the NVLink connections – each P100 requires 800 pins, 400 for PCIe + power, and another 400 for the NVLinks, adding up to nearly 1600 board traces for NVLinks alone (...).'' Each CPU has direct connection to 4 units of P100 via PCIe and each P100 has one NVLink each to the 3 other P100s in the same CPU group plus one more NVLink to one P100 in the other CPU group. ''Each NVLink (link interface) offers a bidirectional 20 GB/sec up 20 GB/sec down, with 4 links per GP100 GPU, for an aggregate bandwidth of 80 GB/sec up and another 80 GB/sec down.'' NVLink supports routing so that in the DGX-1 design for every P100 a total of 4 of the other 7 P100s are directly reachable and the remaining 3 are reachable with only one hop. According to depictions in Nvidia's blog-based publications, from 2014 NVLink allows bundling of individual links for increased point to point performance so that for example a design with two P100s and all links established between the two units would allow the full NVLink bandwidth of 80 GB/s between them. At GTC2017, Nvidia presented its Volta generation of GPUs and indicated the integration of a revised version 2.0 of NVLink that would allow total I/O data rates of 300 GB/s for a single chip for this design, and further announced the option for pre-orders with a delivery promise for Q3/2017 of the DGX-1 and DGX-Station high performance computers that will be equipped with GPU modules of type V100 and have NVLink 2.0 realized in either a networked (two groups of four V100 modules with inter-group connectivity) or a fully interconnected fashion of one group of four V100 modules. In 2017-2018, IBM and Nvidia delivered the
Summit A summit is a point on a surface that is higher in elevation than all points immediately adjacent to it. The topographic terms acme, apex, peak (mountain peak), and zenith are synonymous. The term (mountain top) is generally used only for a m ...
and Sierra supercomputers for the
US Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and manages the research and development of nuclear power and nuclear weapons in the United States. ...
which combine IBM's
POWER9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in ...
family of CPUs and Nvidia's Volta architecture, using NVLink 2.0 for the CPU-GPU and GPU-GPU interconnects and
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also use ...
EDR for the system interconnects. In 2020, Nvidia announced that they will no longer be adding new SLI driver profiles on RTX 2000 series and older from January 1, 2021.


See also

* Intel QuickPath Interconnect *
HyperTransport HyperTransport (HT), formerly known as Lightning Data Transport, is a technology for interconnection of computer processors. It is a bidirectional serial/parallel high-bandwidth, low- latency point-to-point link that was introduced on April 2 ...
* Message Passing Interface *
INK (operating system) INK (for I/O Node Kernel) is the operating system that runs on the input output nodes of the IBM Blue Gene supercomputer.''Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference'' 2004, by Marco Danelutto, Marco Vanneschi and Do ...
*
Compute Node Linux Compute Node Linux (CNL) is a runtime environment based on the Linux kernel for the Cray XT3, Cray XT4, Cray XT5, Cray XT6, Cray XE6 and Cray XK6 supercomputer systems based on SUSE Linux Enterprise Server. CNL forms part of the Cray Linux ...
*
Intel Xe Link Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 serie ...


References

{{Computer-bus Nvidia Computer buses Serial buses