Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of

central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...

s and

graphics processors A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...

on the same bus, with shared

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...

and tasks. The HSA is being developed by the

HSA Foundation The HSA Foundation is a not-for-profit engineering organization of industry and academia that works on the development of the Heterogeneous System Architecture (HSA), a set of royalty-free computer hardware specifications, as well as open source so ...

, which includes (among many others) AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with

OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...

CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...

). CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.

Heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually inco ...

is widely used in

system-on-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memo ...

devices such as tablets,

smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...

s, other mobile devices, and

video game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to ...

s. HSA allows programs to use the graphics processor for

floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...

calculations without separate memory or scheduling.

Rationale

The rationale behind HSA is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers' DSPs, as well. Modern GPUs are very well suited to perform

single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

(SIMD) and single instruction, multiple threads (SIMT), while modern CPUs are still being optimized for branching. etc.

Overview

Originally introduced by

embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...

s such as the

Cell Broadband Engine Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ma ...

, sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units

s (CPUs),

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mo ...

s (GPUs), digital signal processors (DSPs), or any type of

application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-effici ...

s (ASICs). The system architecture allows any accelerator, for instance a graphics processor, to operate at the same processing level as the system's CPU. Among its main features, HSA defines a unified

virtual address space In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the hig ...

for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share

page tables A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, ...

so that devices can exchange data by sharing pointers. This is to be supported by custom

memory management unit A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical ...

s. To render interoperability possible and also to ease various aspects of programming, HSA is intended to be

ISA Isa or ISA may refer to: Places * Isa, Amur Oblast, Russia * Isa, Kagoshima, Japan * Isa, Nigeria * Isa District, Kagoshima, former district in Japan * Isa Town, middle class town located in Bahrain * Mount Isa, Queensland, Australia * Mount Is ...

-agnostic for both CPUs and accelerators, and to support high-level programming languages. So far, the HSA specifications cover:

HSA Intermediate Layer

HSAIL (Heterogeneous System Architecture Intermediate Language), a virtual instruction set for parallel programs * similar to LLVM Intermediate Representation and SPIR (used by

and

Vulkan Vulkan is a low- overhead, cross-platform API, open standard for 3D graphics and computing. Vulkan targets high-performance real-time 3D graphics applications, such as video games and interactive media. Vulkan is intended to offer higher perform ...

) * finalized to a specific instruction set by a JIT compiler * make late decisions on which core(s) should run a task * explicitly parallel * supports exceptions, virtual functions and other high-level features * debugging support

HSA memory model

* compatible with

C++11 C++11 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versio ...

, OpenCL,

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...

and .NET memory models * relaxed consistency * designed to support both managed languages (e.g. Java) and unmanaged languages (e.g. C) * will make it much easier to develop 3rd-party compilers for a wide range of heterogeneous products programmed in Fortran, C++,

C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...

, Java, et al.

HSA dispatcher and run-time

* designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing * any core can schedule work for any other, including itself * significant reduction of overhead of scheduling work for a core Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.

Block diagrams

The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures.

Software support

Some of the HSA-specific features implemented in the hardware need to be supported by the

operating system kernel The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. It is the portion of the operating system code that is always resident in memory and facilitates in ...

and specific device drivers. For example, support for AMD

Radeon Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tec ...

and AMD FirePro graphics cards, and

APUs Apus is a small constellation in the southern sky. It represents a bird-of-paradise, and its name means "without feet" in Greek because the bird-of-paradise was once wrongly believed to lack feet. First depicted on a celestial globe by Petrus ...

based on Graphics Core Next (GCN), was merged into version 3.19 of the Linux kernel mainline, released on 8 February 2015. Programs do not interact directly with , but queue their jobs utilizing the HSA runtime. This very first implementation, known as , focuses on "Kaveri" or "Berlin" APUs and works alongside the exist