A GPU cluster is a
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
in which each node is equipped with a
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
(GPU). By harnessing the computational power of modern GPUs via
general-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditional ...
(GPGPU), very fast calculations can be performed with a GPU cluster.
Hardware (GPU)
GPU clusters fall into two hardware classification categories:
Heterogeneous and Homogeneous.
Heterogeneous
Hardware from both of the major
IHV's can be used (AMD and NVIDIA). Even if different models of the same GPU are used (e.g. 8800GT mixed with 8800GTX) the GPU cluster is considered heterogeneous.
Homogeneous
Each GPU is of the same hardware class, make, and model. For example, it could be a homogeneous cluster of 100 8800GTs, all with the same amount of memory.
Classifying a GPU cluster according to the above semantics largely directs software development on the cluster, as different GPUs have different capabilities that can be utilized.
Hardware (Other)
Interconnect
In addition to the computer nodes and their respective GPUs, a fast enough interconnect is needed in order to shuttle data amongst the nodes. The type of interconnect largely depends on the number of nodes present. Some examples of interconnects include
Gigabit Ethernet
In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use in ...
and
InfiniBand
InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
.
Vendors
NVIDIA
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
provides a list of dedicated Tesla Preferred Partners (TPP) with the capability of building and delivering a fully configured GPU cluster using the Tesla 20-series GPGPUs. AMAX Information Technologies,
Dell
Dell Inc. is an American technology company that develops, sells, repairs, and supports personal computers (PCs), Server (computing), servers, data storage devices, network switches, software, computer peripherals including printers and webcam ...
,
Hewlett-Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
and
Silicon Graphics
Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
are some of the few companies that provide a complete line of GPU clusters and systems.
[http://www.nvidia.com/object/tesla_wtb.html]
Software
The software components that are required to make many GPU-equipped machines act as one include:
#Operating System
#GPU driver for the each type of GPU present in each cluster node.
#Clustering API (such as the
Message Passing Interface
The Message Passing Interface (MPI) is a portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of use ...
, MPI).
#VirtualCL (VCL) cluster platfor
is a wrapper for OpenCL™ that allows most unmodified applications to transparently utilize multiple OpenCL devices in a cluster as if all the devices are on the local computer.
Algorithm mapping
Mapping an algorithm to run a GPU cluster is somewhat similar to mapping an algorithm to run on a traditional
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
. Example: rather than distributing pieces of an array from RAM, a texture is divided up amongst the nodes of the GPU cluster.
References and external links
GPU Cluster for High Performance Computing, SC 2004*
NCSA's Accelerator ClusterGPU Clusters for High-Performance ComputingGPU cluster at STFC Daresbury LaboratoryGPU Cores Temperature MonitoringEkran Kartları (GPU)
{{Reflist
Cluster computing
GPGPU
Graphics hardware