Heterogeneous computing refers to systems that use more than one kind of processor or
core. These systems gain performance or
energy efficiency not just by adding the same type of processors, but by adding dissimilar
coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.
Heterogeneity
Usually heterogeneity in the context of computing refers to different
instruction-set architectures (ISA), where the main processor has one and other processors have another - usually a very different - architecture (maybe more than one), not just a different
microarchitecture (
floating point number processing is a special case of this - not usually referred to as heterogeneous).
In the past heterogeneous computing meant different ISAs had to be handled differently, while in a modern example,
Heterogeneous System Architecture (HSA) systems
eliminate the difference (for the user) while using multiple processor types (typically
CPUs and
GPUs), usually on the same
integrated circuit
An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components a ...
, to provide the best of both worlds: general GPU processing (apart from the GPU's well-known 3D graphics rendering capabilities, it can also perform mathematically intensive computations on very large data-sets), while CPUs can run the operating system and perform traditional serial tasks.
The level of heterogeneity in modern computing systems is gradually increasing as further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a
system-on-chip, or SoC. For example, many new processors now include built-in logic for interfacing with other devices (
SATA,
PCI,
Ethernet
Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
,
USB
Universal Serial Bus (USB) is an industry standard, developed by USB Implementers Forum (USB-IF), for digital data transmission and power delivery between many types of electronics. It specifies the architecture, in particular the physical ...
,
RFID
Radio-frequency identification (RFID) uses electromagnetic fields to automatically identify and track tags attached to objects. An RFID system consists of a tiny radio transponder called a tag, a radio receiver, and a transmitter. When tri ...
,
radio
Radio is the technology of communicating using radio waves. Radio waves are electromagnetic waves of frequency between 3 hertz (Hz) and 300 gigahertz (GHz). They are generated by an electronic device called a transmitter connec ...
s,
UARTs, and
memory controller
A memory controller, also known as memory chip controller (MCC) or a memory controller unit (MCU), is a digital circuit that manages the flow of data going to and from a computer's main memory. When a memory controller is integrated into anothe ...
s), as well as programmable functional units and
hardware accelerators (
GPUs,
cryptography
Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...
co-processors, programmable network processors, A/V encoders/decoders, etc.).
Recent findings show that a heterogeneous-ISA chip multiprocessor that exploits diversity offered by multiple ISAs can outperform the best same-ISA homogeneous architecture by as much as 21% with 23% energy savings and a reduction of 32% in
Energy Delay Product (EDP).
AMD's 2014 announcement on its pin-compatible ARM and x86 SoCs, codename Project Skybridge,
[
]
suggested a heterogeneous-ISA (ARM+x86) chip multiprocessor in the making.
Heterogeneous CPU topology
A system with heterogeneous CPU topology is a system where the same ISA is used, but the cores themselves are different in speed. The setup is more similar to a
symmetric multiprocessor. (Although such systems are technically
asymmetric multiprocessors, the cores do not differ in roles or device access.) There are typically two types of cores: a higher performance core usually known as a "big" or P-core and a more power efficient core usually known as a "small" or E-core. The terms P- and E-cores are usually used in relation to Intel's implementation of hetereogeneous computing, while the terms big and little cores are usually used in relation to the ARM architecture. Some processors have three categories of core, prime, performance and efficiency cores, with prime cores having higher performance than performance cores; a prime core is known as "big", a performance core is known as "medium", and an efficiency core is known as "small".
A common use of such topology is to provide better power efficiency, especially in mobile SoCs.
*
ARM big.LITTLE (succeeded by DynamIQ) is the prototypical case, where faster high-power cores are combined with slower low-power cores.
* Apple has produced
Apple silicon
Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture family, ARM architecture. They are used in nearly all of the company's devices including Mac ...
SoCs with similar organization.
* Intel has also produced hybrid x86-64 chips codenamed
Lakefield, although not without major limitations in instruction set support. The newer
Alder Lake reduces the sacrifice by adding more instruction set support to the "small" core.
Challenges
Heterogeneous computing systems present new challenges not found in typical homogeneous systems.
The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include:
; ISA or
instruction-set architecture
: Compute elements may have different instruction set architectures, leading to binary incompatibility.
; ABI or
application binary interface
An application binary interface (ABI) is an interface exposed by software that is defined for in-process machine code access. Often, the exposing software is a library, and the consumer is a program.
An ABI is at a relatively low-level of a ...
: Compute elements may interpret memory in different ways. This may include both
endianness
file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined
In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
,
calling convention, and memory layout, and depends on both the architecture and
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
being used.
;
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
or
application programming interface
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
: Library and OS services may not be uniformly available to all compute elements.
; Low-Level Implementation of Language Features
: Language features such as functions and threads are often implemented using
function pointer
A function pointer, also called a subroutine pointer or procedure pointer, is a pointer referencing executable code, rather than data. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments ...
s, a mechanism which requires additional translation or abstraction when used in heterogeneous environments.
; Memory Interface and
Hierarchy
A hierarchy (from Ancient Greek, Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy ...
: Compute elements may have different
cache structures,
cache coherency protocols, and memory access may be uniform or non-uniform memory access (
NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
; Interconnect
: Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access (
DMA) devices, mailboxes,
FIFOs, and
scratchpad memories, etc. Furthermore, certain portions of a heterogeneous system may be cache-coherent, whereas others may require explicit software-involvement for maintaining consistency and coherency.
; Performance
: A heterogeneous system may have CPUs that are identical in terms of architecture, but have underlying micro-architectural differences that lead to various levels of performance and power consumption. Asymmetries in capabilities paired with opaque programming models and operating system abstractions can sometimes lead to performance predictability problems, especially with mixed workloads.
;Development tools
: Different types of processors would typically require different tools (editors, compilers, ...) for software developers, which introduces complexity when partitioning the application across those.
;Data Partitioning
: While partitioning data on homogeneous platforms is often trivial, it has been shown that for the general heterogeneous case, the problem is NP-Complete. For small numbers of partitions, optimal partitionings that perfectly balance load and minimize communication volume have been shown to exist.
Example hardware
Heterogeneous computing hardware can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets.
* High Performance Computing
**
Cydra-5 (Numeric coprocessor)
**
Cray XD1 (FPGA)
**
SRC Computers SRC-6 and SRC-7 (FPGA)
* Embedded Systems (DSP and Mobile Platforms)
**
Texas Instruments
Texas Instruments Incorporated (TI) is an American multinational semiconductor company headquartered in Dallas, Texas. It is one of the top 10 semiconductor companies worldwide based on sales volume. The company's focus is on developing analog ...
OMAP (Media coprocessor)
**
Analog Devices Blackfin (DSP and media coprocessors)
**
Qualcomm
Qualcomm Incorporated () is an American multinational corporation headquartered in San Diego, California, and Delaware General Corporation Law, incorporated in Delaware. It creates semiconductors, software and services related to wireless techn ...
Snapdragon (GPU, DSP, image, sometimes AI coprocessor; Modem, Sensors)
**
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
Tegra (GPU; Modem, Sensors)
**
Samsung
Samsung Group (; stylised as SΛMSUNG) is a South Korean Multinational corporation, multinational manufacturing Conglomerate (company), conglomerate headquartered in the Samsung Town office complex in Seoul. The group consists of numerous a ...
Exynos (GPU; Modem, Sensors)
**
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
"A" series (CPU, GPU; Modem)
**
Movidius Myriad Vision processing units, which includes several symmetric processors, complemented by
fixed function units, and a pair of
SPARC based controllers.
**
HiSilicon
HiSilicon ( zh, c=海思, p=Hǎisī) is a Chinese fabless semiconductor company based in Shenzhen, Guangdong province and wholly owned by Huawei. HiSilicon purchases licenses for CPU designs from ARM Holdings, including the ARM Cortex-A9 MPCore ...
Kirin SoCs (GPU; Modem, Sensors)
**
MediaTek SoCs (GPU; Modem, Sensors)
**
Cadence Design Systems
Cadence Design Systems, Inc. (stylized as cādence)Investor's Business DailCEO Lip-Bu Tan Molds Troubled Cadence Into Long-Term LeaderRetrieved November 12, 2020 is an American multinational corporation, multinational technology and computational ...
Tensilica DSPs
* Reconfigurable Computing
**
Xilinx
Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company is renowned for inventing the first commercially viable field-programmable gate array (FPGA). It also pioneered ...
Field-programmable gate array (FPGA; e.g., Virtex-II Pro, Virtex 4 FX, Virtex 5 FXT) and
Zynq and
Versal Platforms
**
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
"Stellarton" (Atom +
Altera FPGA)
* Networking
** Intel
IXP Network Processors
**
Netronome NFP Network Processors
* General Purpose Computing, Gaming, and Entertainment Devices
**
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Sandy Bridge, Ivy Bridge, and Haswell CPUs (Integrated GPU, OpenCL-capable since Ivy Bridge)
**
AMD Excavator
Excavators are heavy equipment (construction), heavy construction equipment primarily consisting of a backhoe, boom, dipper (or stick), Bucket (machine part), bucket, and cab on a rotating platform known as the "house".
The modern excavator's ...
and
Ryzen APUs (Integrated GPU, OpenCL-capable)
**
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
Cell, found in the
PlayStation
is a video gaming brand owned and produced by Sony Interactive Entertainment (SIE), a division of Japanese conglomerate Sony. Its flagship products consists of a series of home video game consoles produced under the brand; it also consists ...
3 (Vector coprocessor)
***
SpursEngine, a variant of the IBM Cell processor
**
Emotion Engine, found in the
PlayStation 2
The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Interactive Entertainment, Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October, in Europe on 24 Novembe ...
(Vector and media coprocessors)
**
ARM big.LITTLE/DynamIQ CPU architecture (heterogeneous topology)
*** Nearly all ARM vendors offer heterogeneous solutions; ARM, Qualcomm, Nvidia, Apple, Samsung, HiSilicon, MediaTek, etc.
See also
*
GPGPU
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
*
MPSoC
*
big.LITTLE/DynamIQ
*
Simultaneous and heterogeneous multithreading
References
{{Reflist, 30em