HOME

TheInfoList



OR:

rCUDA, which stands for Remote CUDA, is a type of
middleware Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to implement c ...
software framework for remote
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mob ...
virtualization. Fully compatible with the
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
application programming interface ( API), it allows the allocation of one or more CUDA-enabled GPUs to a single application. Each GPU can be part of a
cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study th ...
or running inside of a
virtual machine In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
. The approach is aimed at improving performance in GPU clusters that are lacking full utilization. GPU virtualization reduces the number of GPUs needed in a cluster, and in turn, leads to a lower cost configuration – less energy, acquisition, and maintenance. The recommended distributed acceleration architecture is a
high performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multi ...
cluster with GPUs attached to only a few of the cluster nodes. When a node without a local
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mob ...
executes an application needing GPU resources, remote execution of the
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine lea ...
is supported by data and code transfers between local system memory and remote GPU memory. rCUDA is designed to accommodate this client-server architecture. On one end, clients employ a library of wrappers to the high-level CUDA Runtime API, and on the other end, there is a network listening service that receives requests on a
TCP port In computer networking, a port is a number assigned to uniquely identify a connection endpoint and to direct data to a specific service. At the software level, within an operating system, a port is a logical construct that identifies a specific ...
. Several nodes running different GPU-accelerated applications can concurrently make use of the whole set of accelerators installed in the cluster. The client forwards the request to one of the servers, which accesses the GPU installed in that computer and executes the request in it.
Time-multiplexing Time-division multiplexing (TDM) is a method of transmitting and receiving independent signals over a common signal path by means of synchronized switches at each end of the transmission line so that each signal appears on the line only a fracti ...
the GPU, or in other words ''sharing'' it, is accomplished by spawning different server processes for each remote GPU execution request.


rCUDA v20.07

The rCUDA middleware enables the concurrent usage of CUDA-compatible devices remotely. rCUDA employs either the InfiniBand network or the socket API for the communication between clients and servers. rCUDA can be useful in three different environments: * Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc. * Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students. * Virtual Machines. To enable the access to the CUDA facilities on the physical machine. The current version of rCUDA (v20.07) supports CUDA version 9.0, excluding graphics interoperability. rCUDA v20.07 targets the Linux OS (for 64-bit architectures) on both client and server sides. CUDA applications do not need any change in their source code in order to be executed with rCUDA.


References


External links


Nvidia CUDA Official site
{{lowercase Software frameworks GPGPU Computer libraries Middleware System software Cloud platforms Cloud computing Distributed computing architecture Distributed computing Parallel computing