Simultaneous and heterogeneous multithreading (SHMT) is a

software framework In computer programming, a software framework is a software abstraction that provides generic functionality which developers can extend with custom code to create applications. It establishes a standard foundation for building and deploying soft ...

that takes advantage of

heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...

systems that contain a mixture of

central processing units A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...

(CPUs),

graphics processing units A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal co ...

(GPUs), and special purpose

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

hardware, for example Tensor Processing Units (TPUs). Each component processes information differently. Often data has to move among processors, which can create bottlenecks, with one processor starving while waiting on another to finish.

Architecture

The system defines virtual processors and virtual operations (VOPs). VOPs decompose into one or more high-level operations (HLOPs). It then distributes the operations across the processors. The runtime system then dynamically maps virtual processors to physical processors, assessing resource availability in order to keep all the processors busy. The scheduler employs a light-weight, quality-aware work-stealing (QAWS) policy. Conventional runtimes use assign one processor (set) to each subtask, leaving other types of processors idle. In other words, the CPU(s) run (possibly in parallel), then when that subtask completes, the next subtask is handed to the GPU(s). When they finish the next subtask is handed to the TPU(s). Adding software pipelining allows the second subtask to run using partial results from the first subtask, which improves resource utilization. SHMT takes things a step further, identifying subtasks that can run independently of others to the appropriate processor type, allow even better parallelism. Some subtasks can be performed on multiple processor types. SHMT can divide a single subtask across such processor types. Thus the fundamental breakthrough is to keep more processors working simultaneously, reducing time and energy costs.

Benchmark

Researchers tested the concept using a typical smartphone configuration tweaked so that it resembled a data center server. The hardware was Nvidia's Jetson Nano module containing a quad-core

ARM Cortex-A57 The ARM Cortex-A57 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings. The Cortex-A57 is an out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes i ...

processor (CPU) and 128 Maxwell architecture GPU cores. A Google Edge TPU was connected via its M.2 Key E slot. The processors communicated via an onboard

PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as Peripher ...

(PCIe) interface. Shared data was hosted in a 4 GB 64-bit LPDDR4. The Edge TPU adds an 8 MB device memory.

Ubuntu Linux Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical and a community of contributors under a meritocratic governance model, Ubuntu is released ...

18.04 was the operating system. Compared to a conventional system performance increased by 1.95X boost, while energy consumption was reduced by 51%, on a range of benchmarks, including Black–Scholes, DCT8X8, DWT, FFT, Histogram, Hotspot,

Laplacian In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols \nabla\cdot\nabla, \nabla^2 (where \nabla is th ...

, MF, Sobel, SRAD, and GMEAN.

References

{{Parallel computing Software architecture Parallel computing

Architecture

Benchmark

See also

References