ROCm is an

Advanced Micro Devices Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a Information technology, hardware and F ...

(AMD) software stack for

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

(GPU) programming. ROCm spans several domains, including

general-purpose computing on graphics processing units General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditional ...

(GPGPU),

high performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

(HPC), and

heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...

. It offers several programming models: HIP ( GPU-kernel-based programming),

OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...

( directive-based programming), and

OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...

. ROCm is free, libre and

open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...

(except the GPU firmware blobs), and it is distributed under various licenses. ROCm initially stood for Radeon Open Compute platfor''m''; however, due to Open Compute being a registered trademark, ROCm is no longer an acronym — it is simply AMD's open-source stack designed for GPU compute.

Background

The first GPGPU software stack from ATI/AMD was Close to Metal, which became

Stream A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a strea ...

. ROCm was launched around 2016 with the Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks; some tools trace back to GPUOpen and others to the Heterogeneous System Architecture (HSA).

Heterogeneous System Architecture Intermediate Language

HSAIL was aimed at producing a middle-level, hardware-agnostic intermediate representation that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using

LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...

, and its AMDGPU backend that was upstreamed, although there is still research on such enhanced modularity with LLVM MLIR.

Programming abilities

ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware, and ROCm programming via its learning portal. One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit.

Hardware support

ROCm is primarily targeted at discrete professional GPUs, but unofficial support includes the Vega family and

RDNA 2 RDNA 2 is a GPU microarchitecture designed by AMD, released with the Radeon RX 6000 series on November 18, 2020. Alongside powering the RX 6000 series, RDNA 2 is also featured in the SoCs designed by AMD for the PlayStation 5, Xbox Series X/S ...

consumer GPUs. Accelerated Processor Units (APU) are "enabled", but not officially supported. Having ROCm functional there is involved.

Professional-grade GPUs

AMD Instinct accelerators are the first-class ROCm citizens, alongside th
prosumer
Radeon Pro GPU series: they mostly see full support. The only consumer-grade GPU that has relatively equal support is, as of January 2022, the Radeon VII (GCN 5 - Vega).

Consumer-grade GPUs

Software ecosystem

Learning resources

AMD ROCm product manager Terry Deem gave a tour of the stack.

Third-party integration

The main consumers of the stack are machine learning and high-performance computing/GPGPU applications.

Machine learning

Various

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

frameworks have a ROCm backend: *

PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

* ONNX * MXNet * CuPy
MIOpen
* Caffe
Iree
(which uses LLVM Multi-Level Intermediate Representation (MLIR))
llama.cpp

Supercomputing

ROCm is gaining significant traction in the top 500. ROCm is used with the Exascale supercomputers

El Capitan El Capitan (; ) is a vertical Rock formations in the United States, rock formation in Yosemite National Park, on the north side of Yosemite Valley, near its western end. The El Capitan Granite, granite monolith is about from base to summit alo ...

and

Frontier A frontier is a political and geographical term referring to areas near or beyond a boundary. Australia The term "frontier" was frequently used in colonial Australia in the meaning of country that borders the unknown or uncivilised, th ...

. Some related software is to be found a
AMD Infinity hub

Other acceleration & graphics interoperation

As of version 3.0,

Blender A blender (sometimes called a mixer (from Latin ''mixus, the PPP of miscere eng. to Mix)'' or liquidiser in British English) is a kitchen and laboratory appliance used to mix, crush, purée or emulsify food and other substances. A stationary ...

can now use HIP compute kernels for its renderer cycles.

Other Languages

= Julia

= Julia has the AMDGPU.jl package, which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code. AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned.

Software distribution

Official

Installation instructions are provided for Linux and Windows in th
official AMD ROCm documentation
ROCm software is currently spread across several public

GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...

repositories. Within the main publi
meta-repository
there is a
XML manifest
for each official release: usin
git-repo
a

version control Version control (also known as revision control, source control, and source code management) is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files; primarily source code t ...

tool built on top of

Git Git () is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively. Design goals of Git include speed, data integrity, and suppor ...

, is the recommended way to synchronize with the stack locally. AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered unde
AMD Infinity Hub
AM
distributes itself
packages tailored to various Linux distributions.

Third-party

There is a growin
third-party ecosystem packaging ROCm
Linux distributions are officially packaging (natively) ROCm, with various degrees of advancement:

Arch Linux Arch Linux () is an Open-source software, open source, rolling release Linux distribution. Arch Linux is kept up-to-date by regularly updating the individual pieces of software that it comprises. Arch Linux is intentionally minimal, and is meant ...

, Gentoo,

Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...

Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...

, GNU Guix, and NixOS. There are Spack packages.

Components

There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of user-space modules. The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS. AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, is still lagging. AMD still officially packages various LLVM forks for parts that are not yet upstreamed compiler optimizations destined to remain proprietary, debug support, OpenMP offloading, etc.

Low-level

ROCk Kernel driver

ROCm Device libraries

Support libraries
implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc.

ROCt Thunk

Th
thunk
is responsible for all the thinking and queuing that goes into the stack.

ROCr Runtime

Th
ROC runtime
is a set of APIs/libraries that allows the launch of compute kernels by host applications. It is AMD's implementation of the HSA runtime API. It is different from the ROC Common Language Runtime.

ROCm CompilerSupport

ROCm code object manager
is in charge of interacting with LLVM

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

Mid-level

ROCclr Common Language Runtime

Th
common language runtime
is an indirection layer adapting calls to ROCr on Linux and PAL on windows. It used to be able to route between different compilers, like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP and OpenCL).

OpenCL

ROCm ships its installable client driver (ICD) loader and an OpenC
implementation bundled together
As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition.

HIP
Heterogeneous Interface for Portability

The AMD implementation for its GPUs is calle
HIPAMD
There is also
CPU implementation
mostly for demonstration purposes.

HIPCC

HIP builds a `HIPCC` compiler that either wraps

Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...

and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler.

HIPIFY

HIPIFY
is a source-to-source compiling tool. It translates CUDA to HIP and reverse, either using a Clang-based tool, or a sed-like

Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...

script.

GPUFORT

Like HIPIFY
GPUFORT
is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so.

High-level

ROCm high-level libraries are usually consumed directly by application software, such as

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

frameworks. Most of the following libraries are in the General Matrix Multiply (GEMM) category, which GPU architecture excels at. The majority of these user-facing libraries comes in dual-form: ''hip'' for the indirection layer that can route to Nvidia hardware, and ''roc'' for the AMD implementation.

rocBLAS / hipBLAS

rocBLAS
an
hipBLAS
are central in high-level libraries, it is the AMD implementation for

Basic Linear Algebra Subprograms Basic Linear Algebra Subprograms (BLAS) is a specification (technical standard), specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector space, vector addition, scalar multiplicati ...

. It uses the librar
Tensile
privately.

rocSOLVER / hipSOLVER

This pair of libraries constitutes the

LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It als ...

implementation for ROCm and is strongly coupled to rocBLAS.

Utilities

ROCm developer tools
Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management.
GPUOpen tools
GPU analyzer, memory visualizer... * External tools: radeontop ( TUI overview)

Comparison with competitors

ROCm competes with other GPU computing stacks: Nvidia

CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...

and Intel OneAPI.

Nvidia CUDA

Nvidia's CUDA is closed-source, whereas AMD ROCm is open source. There is open-source software built on top of the closed-source CUDA, for instanc
RAPIDS
CUDA is able to run on consumer GPUs, whereas ROCm support is mostly offered for professional hardware such as

AMD Instinct AMD Instinct is AMD's brand of data center Graphics processing unit, GPUs. It replaced AMD's AMD FirePro, FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to acce ...

and AMD Radeon Pro. Nvidia provides a C/C++-centered frontend and its Parallel Thread Execution (PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC).

Intel OneAPI

Like ROCm, oneAPI is open source, and all the corresponding libraries are published on it
GitHub Page

Unified Acceleration Foundation (UXL)

Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the continuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal will compete with Nvidia's CUDA. The main companies behind it are Intel, Google, Arm, Qualcomm, Samsung, Imagination, and VMware.

References

External links

* * * * * * — Docker containers for scientific applications. {{Authority control AMD software Application programming interfaces Concurrent computing GPGPU GPGPU libraries Graphics cards Graphics hardware Heterogeneous computing Machine learning Parallel computing Supercomputers