Simultaneous And Heterogeneous Multithreading
   HOME

TheInfoList



OR:

Simultaneous and heterogeneous multithreading (SHMT) is a
software framework In computer programming, a software framework is a software abstraction that provides generic functionality which developers can extend with custom code to create applications. It establishes a standard foundation for building and deploying soft ...
that takes advantage of
heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...
systems that contain a mixture of
central processing units A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...
(CPUs),
graphics processing units A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal co ...
(GPUs), and special purpose
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
hardware, for example Tensor Processing Units (TPUs). Each component processes information differently. Often data has to move among processors, which can create bottlenecks, with one processor starving while waiting on another to finish.


Architecture

The system defines virtual processors and virtual operations (VOPs). VOPs decompose into one or more high-level operations (HLOPs). It then distributes the operations across the processors. The runtime system then dynamically maps virtual processors to physical processors, assessing resource availability in order to keep all the processors busy. The scheduler employs a light-weight, quality-aware work-stealing (QAWS) policy. Conventional runtimes use assign one processor (set) to each subtask, leaving other types of processors idle. In other words, the CPU(s) run (possibly in parallel), then when that subtask completes, the next subtask is handed to the GPU(s). When they finish the next subtask is handed to the TPU(s). Adding software pipelining allows the second subtask to run using partial results from the first subtask, which improves resource utilization. SHMT takes things a step further, identifying subtasks that can run independently of others to the appropriate processor type, allow even better parallelism. Some subtasks can be performed on multiple processor types. SHMT can divide a single subtask across such processor types. Thus the fundamental breakthrough is to keep more processors working simultaneously, reducing time and energy costs.


Benchmark

Researchers tested the concept using a typical smartphone configuration tweaked so that it resembled a data center server. The hardware was Nvidia's Jetson Nano module containing a quad-core
ARM Cortex-A57 The ARM Cortex-A57 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings. The Cortex-A57 is an out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes i ...
processor (CPU) and 128 Maxwell architecture GPU cores. A Google Edge TPU was connected via its M.2 Key E slot. The processors communicated via an onboard
PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as Peripher ...
(PCIe) interface. Shared data was hosted in a 4 GB 64-bit LPDDR4. The Edge TPU adds an 8 MB device memory.
Ubuntu Linux Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical and a community of contributors under a meritocratic governance model, Ubuntu is released ...
18.04 was the operating system. Compared to a conventional system performance increased by 1.95X boost, while energy consumption was reduced by 51%, on a range of benchmarks, including Black–Scholes, DCT8X8, DWT, FFT, Histogram, Hotspot,
Laplacian In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is th ...
, MF, Sobel, SRAD, and GMEAN.


See also

*
Asymmetric multiprocessing An asymmetric multiprocessing (AMP or ASMP) system is a multiprocessor computer system where not all of the multiple interconnected central processing units (CPUs) are treated equally. For example, a system might allow (either at the hardware or op ...
*
Instruction-level parallelism Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...
(ILP) *
Parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
*
Simultaneous multithreading Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
*
Superscalar processor A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single ins ...
*
Symmetric multiprocessing Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
(SMP) ** Variable SMP *
Thread (computing) In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a pr ...


References

{{Parallel computing Software architecture Parallel computing