HOME

TheInfoList



OR:

OpenCL (Open Computing Language) is a framework for writing programs that execute across
heterogeneous Homogeneity and heterogeneity are concepts relating to the uniformity of a substance, process or image. A homogeneous feature is uniform in composition or character (i.e., color, shape, size, weight, height, distribution, texture, language, i ...
platforms consisting of
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s (CPUs),
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
s (GPUs),
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. ...
s (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
(based on C99) for programming these devices and
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
s (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for
parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
using task- and data-based parallelism. OpenCL is an open standard maintained by the
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
, a
non-profit A nonprofit organization (NPO), also known as a nonbusiness entity, nonprofit institution, not-for-profit organization, or simply a nonprofit, is a non-governmental (private) legal entity organized and operated for a collective, public, or so ...
, open standards organisation. Conformant implementations (passed the Conformance Test Suite) are available from a range of companies including
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
, Arm, Cadence,
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
Imagination Imagination is the production of sensations, feelings and thoughts informing oneself. These experiences can be re-creations of past experiences, such as vivid memories with imagined changes, or completely invented and possibly fantastic scenes ...
,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
,
Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
,
Qualcomm Qualcomm Incorporated () is an American multinational corporation headquartered in San Diego, California, and Delaware General Corporation Law, incorporated in Delaware. It creates semiconductors, software and services related to wireless techn ...
,
Samsung Samsung Group (; stylised as SΛMSUNG) is a South Korean Multinational corporation, multinational manufacturing Conglomerate (company), conglomerate headquartered in the Samsung Town office complex in Seoul. The group consists of numerous a ...
, SPI and Verisilicon.


Overview

OpenCL views a computing system as consisting of a number of ''compute devices'', which might be
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s (CPUs) or "accelerators" such as graphics processing units (GPUs), attached to a ''host'' processor (a CPU). It defines a C-like language for writing programs. Functions executed on an OpenCL device are called " kernels". A single compute device typically consists of several ''compute units'', which in turn comprise multiple ''
processing element This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices. A ...
s'' (PEs). A single kernel execution can run on all or many of the PEs in parallel. How a compute device is subdivided into compute units and PEs is up to the vendor; a compute unit can be thought of as a " core", but the notion of core is hard to define across all the types of devices supported by OpenCL (or even within the category of "CPUs"), and the number of compute units may not correspond to the number of cores claimed in vendors' marketing literature (which may actually be counting SIMD lanes). In addition to its C-like programming language, OpenCL defines an
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
(API) that allows programs running on the host to launch kernels on the compute devices and manage device memory, which is (at least conceptually) separate from host memory. Programs in the OpenCL language are intended to be compiled at run-time, so that OpenCL-using applications are portable between implementations for various host devices. The OpenCL standard defines host APIs for C and C++; third-party APIs exist for other programming languages and platforms such as Python,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
, D and
.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
. An
implementation Implementation is the realization of an application, execution of a plan, idea, scientific modelling, model, design, specification, Standardization, standard, algorithm, policy, or the Management, administration or management of a process or Goal ...
of the OpenCL standard consists of a
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
that implements the API for C and C++, and an OpenCL C
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
for the compute devices targeted. In order to open the OpenCL programming model to other languages or to protect the kernel source from inspection, the Standard Portable Intermediate Representation (SPIR) can be used as a target-independent way to ship kernels between a front-end compiler and the OpenCL back-end. More recently
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
has ratified
SYCL SYCL (pronounced "sickle") is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language ( eDSL) based on pure C++17. It is a standard develope ...
, a higher-level programming model for OpenCL as a single-source eDSL based on pure
C++17 C17, C-17 or C.17 may refer to: Transportation * , a 1917 British C-class submarine Air * Boeing C-17 Globemaster III, a military transport aircraft * Lockheed Y1C-17 Vega, a six-passenger monoplane * Cierva C.17, a 1928 English experimental ...
to improve programming productivity. People interested by C++ kernels but not by SYCL single-source programming style can use C++ features with compute kernel sources written in "C++ for OpenCL" language.


Memory hierarchy

OpenCL defines a four-level
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...
for the compute device: * global memory: shared by all processing elements, but has high access latency (); * read-only memory: smaller, low latency, writable by the host CPU but not the compute devices (); * local memory: shared by a group of processing elements (); * per-element private memory ( registers; ). Not every device needs to implement each level of this hierarchy in hardware.
Consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
between the various levels in the hierarchy is relaxed, and only enforced by explicit
synchronization Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...
constructs, notably barriers. Devices may or may not share memory with the host CPU. The host API provides handles on device memory buffers and functions to transfer data back and forth between host and devices.


OpenCL kernel language

The programming language that is used to write
compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main pro ...
s is called kernel language. OpenCL adopts C/ C++-based languages to specify the kernel computations performed on the device with some restrictions and additions to facilitate efficient mapping to the heterogeneous hardware resources of accelerators. Traditionally OpenCL C was used to program the accelerators in OpenCL standard, later C++ for OpenCL kernel language was developed that inherited all functionality from OpenCL C but allowed to use C++ features in the kernel sources.


OpenCL C language

OpenCL C is a C99-based language dialect adapted to fit the device model in OpenCL. Memory buffers reside in specific levels of the
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...
, and
pointers Pointer may refer to: People with the name * Pointer (surname), a surname (including a list of people with the name) * Pointer Williams (born 1974), American former basketball player Arts, entertainment, and media * ''Pointer'' (journal), the ...
are annotated with the region qualifiers , , , and , reflecting this. Instead of a device program having a function, OpenCL C functions are marked to signal that they are
entry point In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments. To start a program's execution, the loader or operating system passes co ...
s into the program to be called from the host program.
Function pointer A function pointer, also called a subroutine pointer or procedure pointer, is a pointer referencing executable code, rather than data. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments ...
s,
bit field A bit field is a data structure that maps to one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to repre ...
s and
variable-length array In computer programming, a variable-length array (VLA), also called variable-sized or runtime-sized, is an array data structure whose length is determined at runtime, instead of at compile time. In the language C, the VLA is said to have a variab ...
s are omitted, and
recursion Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...
is forbidden. The
C standard library The C standard library, sometimes referred to as libc, is the standard library for the C (programming language), C programming language, as specified in the ISO C standard.International Organization for Standardization, ISO/International Electrote ...
is replaced by a custom set of standard functions, geared toward math programming. OpenCL C is extended to facilitate use of parallelism with vector types and operations, synchronization, and functions to work with work-items and work-groups. In particular, besides scalar types such as and , which behave similarly to the corresponding types in C, OpenCL provides fixed-length vector types such as (4-vector of single-precision floats); such vector types are available in lengths two, three, four, eight and sixteen for various base types. Vectorized operations on these types are intended to map onto
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
instructions sets, e.g., SSE or VMX, when running OpenCL programs on CPUs. Other specialized types include 2-d and 3-d image types.


Example: matrix–vector multiplication

The following is a
matrix–vector multiplication In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
algorithm in OpenCL C. // Multiplies A*x, leaving the result in y. // A is a row-major matrix, meaning the (i,j) element is at A *ncols+j __kernel void matvec(__global const float *A, __global const float *x, uint ncols, __global float *y) The kernel function computes, in each invocation, the
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
of a single row of a matrix and a vector : y_i = a_ \cdot x = \sum_j a_ x_j . To extend this into a full matrix–vector multiplication, the OpenCL runtime
maps A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
the kernel over the rows of the matrix. On the host side, the function does this; it takes as arguments the kernel to execute, its arguments, and a number of work-items, corresponding to the number of rows in the matrix .


Example: computing the FFT

This example will load a
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
(FFT) implementation and execute it. The implementation is shown below. The code asks the OpenCL library for the first available graphics card, creates memory buffers for reading and writing (from the perspective of the graphics card), JIT-compiles the FFT-kernel and then finally asynchronously runs the kernel. The result from the transform is not read in this example. #include #include #include "CL/opencl.h" #define NUM_ENTRIES 1024 int main() // (int argc, const char* argv[]) The actual calculation inside file "fft1D_1024_kernel_src.cl" (based on "Fitting FFT onto the G80 Architecture"): R"( // This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into // calls to a radix 16 function, another radix 16 function and then a radix 4 function __kernel void fft1D_1024 (__global float2 *in, __global float2 *out, __local float *sMemx, __local float *sMemy) )" A full, open source implementation of an OpenCL FFT can be found on Apple's website.


C++ for OpenCL language

In 2020, Khronos announced the transition to the community driven C++ for OpenCL programming language that provides features from
C++17 C17, C-17 or C.17 may refer to: Transportation * , a 1917 British C-class submarine Air * Boeing C-17 Globemaster III, a military transport aircraft * Lockheed Y1C-17 Vega, a six-passenger monoplane * Cierva C.17, a 1928 English experimental ...
in combination with the traditional OpenCL C features. This language allows to leverage a rich variety of language features from standard C++ while preserving backward compatibility to OpenCL C. This opens up a smooth transition path to C++ functionality for the OpenCL kernel code developers as they can continue using familiar programming flow and even tools as well as leverage existing extensions and libraries available for OpenCL C. The language semantics is described in the documentation published in the releases of OpenCL-Docs repository hosted by the Khronos Group but it is currently not ratified by the Khronos Group. The C++ for OpenCL language is not documented in a stand-alone document and it is based on the specification of C++ and OpenCL C. The open source
Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
compiler has supported C++ for OpenCL since release 9. C++ for OpenCL has been originally developed as a Clang compiler extension and appeared in the release 9. As it was tightly coupled with OpenCL C and did not contain any Clang specific functionality its documentation has been re-hosted to the OpenCL-Docs repository from the Khronos Group along with the sources of other specifications and reference cards. The first official release of this document describing C++ for OpenCL version 1.0 has been published in December 2020. C++ for OpenCL 1.0 contains features from C++17 and it is backward compatible with OpenCL C 2.0. In December 2021, a new provisional C++ for OpenCL version 2021 has been released which is fully compatible with the OpenCL 3.0 standard. A work in progress draft of the latest C++ for OpenCL documentation can be found on the Khronos website.


Features

C++ for OpenCL supports most of the features (syntactically and semantically) from OpenCL C except for nested parallelism and blocks. However, there are minor differences in some supported features mainly related to differences in semantics between C++ and C. For example, C++ is more strict with the implicit type conversions and it does not support the type qualifier. The following C++ features are not supported by C++ for OpenCL: virtual functions, operator, non-placement / operators, exceptions, pointer to member functions, references to functions, C++ standard libraries. C++ for OpenCL extends the concept of separate memory regions (''address spaces'') from OpenCL C to C++ features – functional casts, templates, class members, references, lambda functions, and operators. Most of C++ features are not available for the kernel functions e.g. overloading or templating, arbitrary class layout in parameter type.


Example: complex-number arithmetic

The following code snippet illustrates how kernels with complex-number arithmetic can be implemented in C++ for OpenCL language with convenient use of C++ features. // Define a class Complex, that can perform complex-number computations with // various precision when different types for T are used - double, float, half. template class complex_t ; // A helper function to compute multiplication over complex numbers read from // the input buffer and to store the computed result into the output buffer. template void compute_helper(__global T *in, __global T *out) // This kernel is used for complex-number multiplication in single precision. __kernel void compute_sp(__global float *in, __global float *out) #ifdef cl_khr_fp16 // This kernel is used for complex-number multiplication in half precision when // it is supported by the device. #pragma OPENCL EXTENSION cl_khr_fp16: enable __kernel void compute_hp(__global half *in, __global half *out) #endif


Tooling and execution environment

C++ for OpenCL language can be used for the same applications or libraries and in the same way as OpenCL C language is used. Due to the rich variety of C++ language features, applications written in C++ for OpenCL can express complex functionality more conveniently than applications written in OpenCL C and in particular
generic programming Generic programming is a style of computer programming in which algorithms are written in terms of data types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneer ...
paradigm from C++ is very attractive to the library developers. C++ for OpenCL sources can be compiled by OpenCL drivers that support ''cl_ext_cxx_for_opencl'' extension. Arm has announced support for this extension in December 2020. However, due to increasing complexity of the algorithms accelerated on OpenCL devices, it is expected that more applications will compile C++ for OpenCL kernels offline using stand alone compilers such as Clang into executable binary format or portable binary format e.g. SPIR-V. Such an executable can be loaded during the OpenCL applications execution using a dedicated OpenCL API. Binaries compiled from sources in C++ for OpenCL 1.0 can be executed on OpenCL 2.0 conformant devices. Depending on the language features used in such kernel sources it can also be executed on devices supporting earlier OpenCL versions or OpenCL 3.0. Aside from OpenCL drivers kernels written in C++ for OpenCL can be compiled for execution on Vulkan devices using clspv compiler and clvk runtime layer just the same way as OpenCL C kernels.


Contributions

C++ for OpenCL is an open language developed by the community of contributors listed in its documentation. New contributions to the language semantic definition or open source tooling support are accepted from anyone interested as soon as they are aligned with the main design philosophy and they are reviewed and approved by the experienced contributors.


History

OpenCL was initially developed by
Apple Inc. Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Comput ...
, which holds
trademark A trademark (also written trade mark or trade-mark) is a form of intellectual property that consists of a word, phrase, symbol, design, or a combination that identifies a Good (economics and accounting), product or Service (economics), service f ...
rights, and refined into an initial proposal in collaboration with technical teams at
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
,
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
,
Qualcomm Qualcomm Incorporated () is an American multinational corporation headquartered in San Diego, California, and Delaware General Corporation Law, incorporated in Delaware. It creates semiconductors, software and services related to wireless techn ...
,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
, and
Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
. Apple submitted this initial proposal to the
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
. On June 16, 2008, the Khronos Compute Working Group was formed with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18, 2008. This technical specification was reviewed by the Khronos members and approved for public release on December 8, 2008.


OpenCL 1.0

OpenCL 1.0 released with
Mac OS X Snow Leopard Mac OS X Snow Leopard (version 10.6) (also referred to as OS X Snow Leopard) is the seventh major release of macOS, Apple's desktop and server operating system for Macintosh computers. Snow Leopard was publicly unveiled on June 8, 2009, at A ...
on August 28, 2009. According to an Apple press release:
Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.
AMD decided to support OpenCL instead of the now deprecated Close to Metal in its Stream framework. RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface. On December 9, 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit. On October 30, 2009, IBM released its first OpenCL implementation as a part of the XL compilers. Acceleration of calculations with factor to 1000 are possible with OpenCL in graphic cards against normal CPU. Some important features of next Version of OpenCL are optional in 1.0 like double- or half-precision operations.


OpenCL 1.1

OpenCL 1.1 was ratified by the Khronos Group on June 14, 2010, and adds significant functionality for enhanced parallel programming flexibility, functionality, and performance including: * New data types including 3-component vectors and additional image formats; * Handling commands from multiple host threads and processing buffers across multiple devices; * Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions; * Enhanced use of events to drive and control command execution; * Additional OpenCL built-in C functions such as integer clamp, shuffle, and asynchronous strided copies; * Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.


OpenCL 1.2

On November 15, 2011, the Khronos Group announced the OpenCL 1.2 specification, which added significant functionality over the previous versions in terms of performance and features for parallel programming. Most notable features include: * Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks. * Separate compilation and linking of objects: the functionality to compile OpenCL into external libraries for inclusion into other programs. * Enhanced image support (optional): 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images. * Built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors. * DirectX functionality: DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled. * The ability to force
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
compliance for single-precision floating-point math: OpenCL by default allows the single-precision versions of the division, reciprocal, and square root operation to be less accurate than the correctly rounded values that IEEE 754 requires. If the programmer passes the "-cl-fp32-correctly-rounded-divide-sqrt" command line argument to the compiler, these three operations will be computed to IEEE 754 requirements if the OpenCL implementation supports this, and will fail to compile if the OpenCL implementation does not support computing these operations to their correctly rounded values as defined by the IEEE 754 specification. This ability is supplemented by the ability to query the OpenCL implementation to determine if it can perform these operations to IEEE 754 accuracy.


OpenCL 2.0

On November 18, 2013, the Khronos Group announced the ratification and public release of the finalized OpenCL 2.0 specification. Updates and additions to OpenCL 2.0 include: * Shared virtual memory * Nested parallelism * Generic address space * Images (optional, include 3D-Image) * C11 atomics * Pipes * Android installable client driver extension * half precision extended with optional cl_khr_fp16 extension * cl_double: double precision IEEE 754 (optional)


OpenCL 2.1

The ratification and release of the OpenCL 2.1 provisional specification was announced on March 3, 2015, at the Game Developer Conference in San Francisco. It was released on November 16, 2015. It introduced the OpenCL C++ kernel language, based on a subset of
C++14 C14, C.XIV or C-14 may refer to: Time * The 14th century * Carbon-14, a radioactive isotope of carbon ** Radiocarbon dating, C-14 dating, a method for dating events Science * IEC 60320#C14, IEC 60320 C14, a polarised, three pole socket electrical ...
, while maintaining support for the preexisting OpenCL C kernel language.
Vulkan Vulkan is a cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and o ...
and OpenCL 2.1 share SPIR-V as an
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
allowing high-level language front-ends to share a common compilation target. Updates to the OpenCL API include: * Additional subgroup functionality * Copying of kernel objects and states * Low-latency device timer queries * Ingestion of SPIR-V code by runtime * Execution priority hints for queues * Zero-sized dispatches from host AMD, ARM, Intel, HPC, and YetiWare have declared support for OpenCL 2.1.


OpenCL 2.2

OpenCL 2.2 brings the OpenCL C++ kernel language into the core specification for significantly enhanced parallel programming productivity. It was released on May 16, 2017. Maintenance Update released in May 2018 with bugfixes. * The OpenCL C++ kernel language is a static subset of the
C++14 C14, C.XIV or C-14 may refer to: Time * The 14th century * Carbon-14, a radioactive isotope of carbon ** Radiocarbon dating, C-14 dating, a method for dating events Science * IEC 60320#C14, IEC 60320 C14, a polarised, three pole socket electrical ...
standard and includes classes, templates, lambda expressions, function overloads and many other constructs for generic and meta-programming. * Uses the new Khronos SPIR-V 1.1 intermediate language which fully supports the OpenCL C++ kernel language. * OpenCL library functions can now use the C++ language to provide increased safety and reduced undefined behavior while accessing features such as atomics, iterators, images, samplers, pipes, and device queue built-in types and address spaces. * Pipe storage is a new device-side type in OpenCL 2.2 that is useful for FPGA implementations by making connectivity size and type known at compile time, enabling efficient device-scope communication between kernels. * OpenCL 2.2 also includes features for enhanced optimization of generated code: applications can provide the value of specialization constant at SPIR-V compilation time, a new query can detect non-trivial constructors and destructors of program scope global objects, and user callbacks can be set at program release time. * Runs on any OpenCL 2.0-capable hardware (only a driver update is required).


OpenCL 3.0

The OpenCL 3.0 specification was released on September 30, 2020, after being in preview since April 2020. OpenCL 1.2 functionality has become a mandatory baseline, while all OpenCL 2.x and OpenCL 3.0 features were made optional. The specification retains the OpenCL C language and deprecates the OpenCL C++ Kernel Language, replacing it with the C++ for OpenCL language based on a
Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
/
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
compiler which implements a subset of
C++17 C17, C-17 or C.17 may refer to: Transportation * , a 1917 British C-class submarine Air * Boeing C-17 Globemaster III, a military transport aircraft * Lockheed Y1C-17 Vega, a six-passenger monoplane * Cierva C.17, a 1928 English experimental ...
and SPIR-V intermediate code. Version 3.0.7 of C++ for OpenCL with some Khronos openCL extensions were presented at IWOCL 21. Actual is 3.0.11 with some new extensions and corrections. NVIDIA, working closely with the Khronos OpenCL Working Group, improved Vulkan Interop with semaphores and memory sharing. Last minor update was 3.0.14 with bugfix and a new extension for multiple devices.


Roadmap

When releasing OpenCL 2.2, the Khronos Group announced that OpenCL would converge where possible with
Vulkan Vulkan is a cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and o ...
to enable OpenCL software deployment flexibility over both APIs. This has been now demonstrated by Adobe's Premiere Rush using the clspv open source compiler to compile significant amounts of OpenCL C kernel code to run on a Vulkan runtime for deployment on Android. OpenCL has a forward looking roadmap independent of Vulkan, with 'OpenCL Next' under development and targeting release in 2020. OpenCL Next may integrate extensions such as Vulkan / OpenCL Interop, Scratch-Pad Memory Management, Extended Subgroups, SPIR-V 1.4 ingestion and SPIR-V Extended debug info. OpenCL is also considering Vulkan-like loader and layers and a "flexible profile" for deployment flexibility on multiple accelerator types.


Open source implementations

OpenCL consists of a set of headers and a shared object that is loaded at runtime. An installable client driver (ICD) must be installed on the platform for every class of vendor for which the runtime would need to support. That is, for example, in order to support Nvidia devices on a Linux platform, the Nvidia ICD would need to be installed such that the OpenCL runtime (the ICD loader) would be able to locate the ICD for the vendor and redirect the calls appropriately. The standard OpenCL header is used by the consumer application; calls to each function are then proxied by the OpenCL runtime to the appropriate driver using the ICD. Each vendor must implement each OpenCL call in their driver. The Apple, Nvidia, ROCm, RapidMind and Gallium3D implementations of OpenCL are all based on the
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
Compiler technology and use the
Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
compiler as their frontend. ; MESA Gallium Compute : An implementation of OpenCL (actual 1.1 incomplete, mostly done AMD Radeon GCN) for a number of platforms is maintained as part of the Gallium Compute Project, which builds on the work of the Mesa project to support multiple platforms. Formerly this was known as CLOVER., actual development: mostly support for running incomplete framework with actual LLVM and CLANG, some new features like fp16 in 17.3, Target complete OpenCL 1.0, 1.1 and 1.2 for AMD and Nvidia. New Basic Development is done by
Red Hat Red Hat, Inc. (formerly Red Hat Software, Inc.) is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North ...
with SPIR-V also for Clover. New Target is modular OpenCL 3.0 with full support of OpenCL 1.2. Actual state is available in Mesamatrix. Image supports are here in the focus of development. : RustiCL is a new implementation for Gallium compute with
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
instead of C. In Mesa 22.2 experimental implementation is available with openCL 3.0-support and image extension implementation for programs like Darktable. Intel Xe (Arc) and AMD GCN+ are supported in Mesa 22.3+. AMD R600 and Nvidia Kepler+ are also target of hardware support. RustiCL outperform AMD ROCM with Radeon RX 6700 XT hardware at Luxmark Benchmark. Mesa 23.1 supports official RustiCL. In Mesa 23.2 support of important fp64 is at experimental level. :
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
's
Windows 11 Windows 11 is a version of Microsoft's Windows NT operating system, released on October 5, 2021, as the successor to Windows 10 (2015). It is available as a free upgrade for devices running Windows 10 that meet the #System requirements, Windo ...
on Arm added support for OpenCL 1.2 via CLon12, an open source OpenCL implementation on top DirectX 12 via Mesa Gallium. ; BEIGNET : An implementation by Intel for its Ivy Bridge + hardware was released in 2013. This software from Intel's China Team, has attracted criticism from developers at AMD and
Red Hat Red Hat, Inc. (formerly Red Hat Software, Inc.) is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North ...
, as well as Michael Larabel of
Phoronix Phoronix Test Suite (PTS) is a free and open-source benchmark software for Linux and other operating systems. The Phoronix Test Suite, developed by Michael Larabel and Matthew Tippett, has been endorsed by sites such as Linux.com, LinuxPlanet ...
. Actual Version 1.3.2 support OpenCL 1.2 complete (Ivy Bridge and higher) and OpenCL 2.0 optional for Skylake and newer. support for Android has been added to Beignet., actual development targets: only support for 1.2 and 2.0, road to OpenCL 2.1, 2.2, 3.0 is gone to NEO. ; NEO: An implementation by Intel for Gen. 8 Broadwell + Gen. 9 hardware released in 2018. This driver replaces Beignet implementation for supported platforms (not older 6.gen to Haswell). NEO provides OpenCL 2.1 support on Core platforms and OpenCL 1.2 on Atom platforms. Actual in 2020 also Graphic Gen 11 Ice Lake and Gen 12 Tiger Lake are supported. New OpenCL 3.0 is available for Alder Lake, Tiger Lake to Broadwell with Version 20.41+. It includes now optional OpenCL 2.0, 2.1 Features complete and some of 2.2. ; ROCm : Created as part of AMD's GPUOpen, ROCm (Radeon Open Compute) is an open source Linux project built on OpenCL 1.2 with language support for 2.0. The system is compatible with all modern AMD CPUs and APUs (actual partly GFX 7, GFX 8 and 9), as well as Intel Gen7.5+ CPUs (only with PCI 3.0). With version 1.9 support is in some points extended experimental to Hardware with PCIe 2.0 and without atomics. An overview of actual work is done on XDC2018. ROCm Version 2.0 supports Full OpenCL 2.0, but some errors and limitations are on the todo list. Version 3.3 is improving in details. Version 3.5 does support OpenCL 2.2. Version 3.10 was with improvements and new APIs. Announced at SC20 is ROCm 4.0 with support of AMD Compute Card Instinct MI 100. Actual documentation of 5.5.1 and before is available at GitHub. OpenCL 3.0 is available. RocM 5.5.x+ supports only GFX 9 Vega and later, so alternative are older RocM Releases or in future RustiCL for older Hardware. ; POCL: A portable implementation supporting CPUs and some GPUs (via
CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
and HSA). Building on
Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
and
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
. With version 1.0 OpenCL 1.2 was nearly fully implemented along with some 2.x features. Version 1.2 is with LLVM/CLANG 6.0, 7.0 and Full OpenCL 1.2 support with all closed tickets in Milestone 1.2. OpenCL 2.0 is nearly full implemented. Version 1.3 Supports Mac OS X. Version 1.4 includes support for
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
8.0 and 9.0. Version 1.5 implements LLVM/Clang 10 support. Version 1.6 implements LLVM/Clang 11 support and CUDA Acceleration. Actual targets are complete OpenCL 2.x, OpenCL 3.0 and improvement of performance. POCL 1.6 is with manual optimization at the same level of Intel compute runtime. Version 1.7 implements LLVM/Clang 12 support and some new OpenCL 3.0 features. Version 1.8 implements LLVM/Clang 13 support. Version 3.0 implements OpenCL 3.0 at minimum level and LLVM/Clang 14. Version 3.1 works with LLVM/Clang 15 and improved Spir-V support. ; Shamrock: A Port of Mesa Clover for ARM with full support of OpenCL 1.2, no actual development for 2.0. ; FreeOCL : A CPU focused implementation of OpenCL 1.2 that implements an external compiler to create a more reliable platform, no actual development. ; MOCL: An OpenCL implementation based on POCL by the NUDT researchers for Matrix-2000 was released in 2018. The Matrix-2000 architecture is designed to replace the Intel Xeon Phi accelerators of the TianHe-2 supercomputer. This programming framework is built on top of LLVM v5.0 and reuses some code pieces from POCL as well. To unlock the hardware potential, the device runtime uses a push-based task dispatching strategy and the performance of the kernel atomics is improved significantly. This framework has been deployed on the TH-2A system and is readily available to the public. Some of the software will next ported to improve POCL. ; VC4CL: An OpenCL 1.2 implementation for the VideoCore IV (BCM2763) processor used in the
Raspberry Pi Raspberry Pi ( ) is a series of small single-board computers (SBCs) developed in the United Kingdom by the Raspberry Pi Foundation in collaboration with Broadcom Inc., Broadcom. To commercialize the product and support its growing demand, the ...
before its model 4.


Vendor implementations


Timeline of vendor implementations

* June, 2008: During Apple's WWDC conference an early beta of
Mac OS X Snow Leopard Mac OS X Snow Leopard (version 10.6) (also referred to as OS X Snow Leopard) is the seventh major release of macOS, Apple's desktop and server operating system for Macintosh computers. Snow Leopard was publicly unveiled on June 8, 2009, at A ...
was made available to the participants, it included the first beta implementation of OpenCL, about 6 months before the final version 1.0 specification was ratified late 2008. They also showed two demos. One was a grid of 8×8 screens rendered, each displaying the screen of an emulated Apple II machine – 64 independent instances in total, each running a famous karate game. This showed task parallelism, on the CPU. The other demo was a ''N''-body simulation running on the GPU of a Mac Pro, a data parallel task. * December 10, 2008: AMD and Nvidia held the first public OpenCL demonstration, a 75-minute presentation at
SIGGRAPH SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) is an annual conference centered around computer graphics organized by ACM, starting in 1974 in Boulder, CO. The main conference has always been held in North ...
Asia 2008. AMD showed a CPU-accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU-accelerated demo. * March 16, 2009: at the 4th Multicore Expo, Imagination Technologies announced the
PowerVR PowerVR is a division of Imagination Technologies (formerly VideoLogic) that develops hardware and software for 2D and 3D rendering, and for video encoding, video decoding, decoding, associated image processing and DirectX, OpenGL ES, OpenVG, and ...
SGX543MP, the first GPU of this company to feature OpenCL support. * March 26, 2009: at GDC 2009, AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on ATI Radeon HD 4000 series GPU. * April 20, 2009: Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program. * August 5, 2009: AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2.0 Beta Program. * August 28, 2009: Apple released
Mac OS X Snow Leopard Mac OS X Snow Leopard (version 10.6) (also referred to as OS X Snow Leopard) is the seventh major release of macOS, Apple's desktop and server operating system for Macintosh computers. Snow Leopard was publicly unveiled on June 8, 2009, at A ...
, which contains a full implementation of OpenCL. * September 28, 2009: Nvidia released its own OpenCL drivers and SDK implementation. * October 13, 2009: AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete OpenCL implementation on both R700/ HD 5000 GPUs and
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
capable CPUs. The SDK is available for both Linux and Windows. * November 26, 2009: Nvidia released drivers for OpenCL 1.0 (rev 48). * October 27, 2009: S3 released their first product supporting native OpenCL 1.0 – the Chrome 5400E embedded graphics processor. * December 10, 2009: VIA released their first product supporting OpenCL 1.0 – ChromotionHD 2.0 video processor included in VN1000 chipset. * December 21, 2009: AMD released the production version of the ATI Stream SDK 2.0, which provides OpenCL 1.0 support for HD 5000 GPUs and beta support for R700 GPUs. * June 1, 2010: ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld, embedded and digital home products. * June 30, 2010: IBM released a fully conformant version of OpenCL 1.0. * September 13, 2010:
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
released details of their first OpenCL implementation for the Sandy Bridge chip architecture. Sandy Bridge will integrate Intel's newest graphics chip technology directly onto the central processing unit. * November 15, 2010:
Wolfram Research Wolfram Research, Inc. ( ) is an American Multinational corporation, multinational company that creates computational technology. Wolfram's flagship product is the technical computing program Wolfram Mathematica, first released on June 23, 1988. ...
released Mathematica 8 with OpenCLLink package. * March 3, 2011:
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
announces the formation of the WebCL working group to explore defining a
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
binding to OpenCL. This creates the potential to harness GPU and multi-core CPU parallel processing from a
Web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
. * March 31, 2011: IBM released a fully conformant version of OpenCL 1.1. * April 25, 2011: IBM released OpenCL Common Runtime v0.1 for Linux on x86 Architecture. * May 4, 2011: Nokia Research releases an open source WebCL extension for the
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
web browser, providing a JavaScript binding to OpenCL. * July 1, 2011: Samsung Electronics releases an open source prototype implementation of WebCL for WebKit, providing a JavaScript binding to OpenCL. * August 8, 2011: AMD released the OpenCL-driven AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) v2.5, replacing the ATI Stream SDK as technology and concept. * December 12, 2011: AMD released AMD APP SDK v2.6 which contains a preview of OpenCL 1.2. * February 27, 2012:
The Portland Group PGI (formerly The Portland Group, Inc.) was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc.ARM CPUs. * April 17, 2012: Khronos released a WebCL working draft. * May 6, 2013: Altera released the Altera SDK for OpenCL, version 13.0. It is conformant to OpenCL 1.0. * November 18, 2013: Khronos announced that the specification for OpenCL 2.0 had been finalized. * March 19, 2014: Khronos releases the WebCL 1.0 specification. * August 29, 2014: Intel releases HD Graphics 5300 driver that supports OpenCL 2.0. * September 25, 2014: AMD releases Catalyst 14.41 RC1, which includes an OpenCL 2.0 driver. * January 14, 2015: Xilinx Inc. announces SDAccel development environment for OpenCL, C, and C++, achieves Khronos Conformance. * April 13, 2015: Nvidia releases WHQL driver v350.12, which includes OpenCL 1.2 support for GPUs based on Kepler or later architectures. Driver 340+ support OpenCL 1.1 for Tesla and Fermi. * August 26, 2015: AMD released AMD APP SDK v3.0 which contains full support of OpenCL 2.0 and sample coding. * November 16, 2015: Khronos announced that the specification for OpenCL 2.1 had been finalized. * April 18, 2016: Khronos announced that the specification for OpenCL 2.2 had been provisionally finalized. * November 3, 2016: Intel support for Gen7+ of OpenCL 2.1 in SDK 2016 r3. * February 17, 2017: Nvidia begins evaluation support of OpenCL 2.0 with driver 378.66. * May 16, 2017: Khronos announced that the specification for OpenCL 2.2 had been finalized with SPIR-V 1.2. * May 14, 2018: Khronos announced Maintenance Update for OpenCL 2.2 with Bugfix and unified headers. * April 27, 2020: Khronos announced provisional Version of OpenCL 3.0. * June 1, 2020: Intel NEO runtime with OpenCL 3.0 for new Tiger Lake. * June 3, 2020: AMD announced RocM 3.5 with OpenCL 2.2 support. * September 30, 2020: Khronos announced that the specifications for OpenCL 3.0 had been finalized (CTS also available). * October 16, 2020: Intel announced with NEO 20.41 support for OpenCL 3.0 (includes mostly of optional OpenCL 2.x). * April 6, 2021: Nvidia supports OpenCL 3.0 for Ampere. Maxwell and later GPUs also supports OpenCL 3.0 with Nvidia driver 465+. * August 20, 2022: Intel Arc Alchemist GPUs (Arc A380, A350M, A370M, A550M, A730M and A770M) are conformant with OpenCL 3.0. * October 14, 2022: Arm Mali-G615 and Mali-G715-Immortalis are conformant with OpenCL 3.0. * November 11, 2022: The Rusticl OpenCL Library is conformant with OpenCL 3.0.


Devices

As of 2016, OpenCL runs on
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
s (GPUs),
CPUs A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
with
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
instructions, FPGAs, Movidius Myriad 2, Adapteva Epiphany and DSPs.


Khronos Conformance Test Suite

To be officially conformant, an implementation must pass the Khronos Conformance Test Suite (CTS), with results being submitted to the Khronos Adopters Program. The Khronos CTS code for all OpenCL versions has been available in open source since 2017.


Conformant products

The
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
maintains an extended list of OpenCL-conformant products. All standard-conformant implementations can be queried using one of the clinfo tools (there are multiple tools with the same name and similar feature set).


Version support

Products and their version of OpenCL support include:


OpenCL 3.0 support

All hardware with OpenCL 1.2+ is possible, OpenCL 2.x only optional, Khronos Test Suite available since 2020-10 * (2020) Intel NEO Compute: 20.41+ for Gen 12 Tiger Lake to Broadwell (include full 2.0 and 2.1 support and parts of 2.2) * (2020) Intel 6th, 7th, 8th, 9th, 10th, 11th gen processors ( Skylake,
Kaby Lake Kaby Lake is Intel's codename for its seventh generation Core microprocessor family announced on August 30, 2016. Like the preceding Skylake, Kaby Lake is produced using a 14 nanometer manufacturing process technology. Breaking with Intel's p ...
, Coffee Lake, Comet Lake, Ice Lake, Tiger Lake) with latest Intel Windows graphics driver * (2021) Intel 11th, 12th gen processors (
Rocket Lake Rocket Lake is Intel's codename for its 11th generation Core microprocessors. Released on March 30, 2021, it is based on the new Cypress Cove microarchitecture, a variant of Sunny Cove (used by Intel's Ice Lake mobile processors) backporte ...
, Alder Lake) with latest Intel Windows graphics driver * (2021) Arm Mali-G78, Mali-G310, Mali-G510, Mali-G610, Mali-G710 and Mali-G78AE. * (2022) Intel 13th gen processors (
Raptor Lake Raptor Lake is Intel's List of Intel codenames, codename for the 13th and 14th generations of Intel Core processors based on a Heterogeneous computing, hybrid architecture, utilizing Raptor Cove performance cores and Gracemont (microarchitecture ...
) with latest Intel Windows graphics driver * (2022) Intel Arc discrete graphics with latest Intel Arc Windows graphics driver * (2021) Nvidia
Maxwell Maxwell may refer to: People * Maxwell (surname), including a list of people and fictional characters with the name ** James Clerk Maxwell, mathematician and physicist * Justice Maxwell (disambiguation) * Maxwell baronets, in the Baronetage of N ...
, Pascal, Volta, Turing and
Ampere The ampere ( , ; symbol: A), often shortened to amp,SI supports only the use of symbols and deprecates the use of abbreviations for units. is the unit of electric current in the International System of Units (SI). One ampere is equal to 1 c ...
with Nvidia graphics driver 465+. * (2022) Nvidia
Ada Lovelace Augusta Ada King, Countess of Lovelace (''née'' Byron; 10 December 1815 – 27 November 1852), also known as Ada Lovelace, was an English mathematician and writer chiefly known for her work on Charles Babbage's proposed mechanical general-pur ...
with Nvidia graphics driver 525+. * (2022) Samsung Xclipse 920 GPU (based on AMD RDNA2) * (2023) Intel 14th gen processors (
Raptor Lake Raptor Lake is Intel's List of Intel codenames, codename for the 13th and 14th generations of Intel Core processors based on a Heterogeneous computing, hybrid architecture, utilizing Raptor Cove performance cores and Gracemont (microarchitecture ...
) Refresh with latest Intel Windows graphics driver * (2023) Intel Core Ultra Series 1 processors ( Meteor Lake) with latest Intel Windows graphics driver


OpenCL 2.2 support

''None yet'': Khronos Test Suite ready, with Driver Update all Hardware with 2.0 and 2.1 support possible * Intel NEO Compute: Work in Progress for actual products * ROCm: Version 3.5+ mostly


OpenCL 2.1 support

* (2018+) Support backported to Intel 5th and 6th gen processors ( Broadwell, Skylake) * (2017+) Intel 7th, 8th, 9th, 10th gen processors (
Kaby Lake Kaby Lake is Intel's codename for its seventh generation Core microprocessor family announced on August 30, 2016. Like the preceding Skylake, Kaby Lake is produced using a 14 nanometer manufacturing process technology. Breaking with Intel's p ...
, Coffee Lake, Comet Lake, Ice Lake) * (2017+) Intel Xeon Phi processors (Knights Landing) (experimental runtime) * Khronos: with Driver Update all Hardware with 2.0 support possible


OpenCL 2.0 support

* (2011+) AMD GCN GPU's (HD 7700+/HD 8000/Rx 200/Rx 300/Rx 400/Rx 500/Rx 5000-Series), some GCN 1st Gen only 1.2 with some Extensions * (2013+) AMD GCN APU's (
Jaguar The jaguar (''Panthera onca'') is a large felidae, cat species and the only extant taxon, living member of the genus ''Panthera'' that is native to the Americas. With a body length of up to and a weight of up to , it is the biggest cat spe ...
,
Steamroller A steamroller (or steam roller) is a form of road roller – a type of heavy construction machinery used for leveling surfaces, such as roads or airfields – that is powered by a steam engine. The leveling/flattening action is achieved through ...
, Puma,
Excavator Excavators are heavy equipment (construction), heavy construction equipment primarily consisting of a backhoe, boom, dipper (or stick), Bucket (machine part), bucket, and cab on a rotating platform known as the "house". The modern excavator's ...
&
Zen Zen (; from Chinese: ''Chán''; in Korean: ''Sŏn'', and Vietnamese: ''Thiền'') is a Mahayana Buddhist tradition that developed in China during the Tang dynasty by blending Indian Mahayana Buddhism, particularly Yogacara and Madhyamaka phil ...
-based) * (2014+) Intel 5th & 6th gen processors ( Broadwell, Skylake) * (2015+) Qualcomm Adreno 5xx series * (2018+) Qualcomm Adreno 6xx series * (2017+) ARM Mali (Bifrost) G51 and G71 in Android 7.1 and Linux * (2018+) ARM Mali (Bifrost) G31, G52, G72 and G76 * (2017+) incomplete Evaluation support: Nvidia
Kepler Johannes Kepler (27 December 1571 – 15 November 1630) was a German astronomer, mathematician, astrologer, natural philosopher and writer on music. He is a key figure in the 17th-century Scientific Revolution, best known for his laws of p ...
,
Maxwell Maxwell may refer to: People * Maxwell (surname), including a list of people and fictional characters with the name ** James Clerk Maxwell, mathematician and physicist * Justice Maxwell (disambiguation) * Maxwell baronets, in the Baronetage of N ...
, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900 & 10-series, Quadro K-, M- & P-series, Tesla K-, M- & P-series) with Driver Version 378.66+


OpenCL 1.2 support

* (2011+) for some AMD GCN 1st Gen some OpenCL 2.0 Features not possible today, but many more Extensions than Terascale * (2009+) AMD TeraScale 2 & 3 GPU's (RV8xx, RV9xx in HD 5000, 6000 & 7000 Series) * (2011+) AMD TeraScale APU's ( K10,
Bobcat The bobcat (''Lynx rufus''), also known as the wildcat, bay lynx, or red lynx, is one of the four extant species within the medium-sized wild cat genus '' Lynx''. Native to North America, it ranges from southern Canada through most of the c ...
&
Piledriver Piledriver or pile driver may refer to: *Pile driver, a person trained to use the diesel hammer that drives piles into the ground for foundations and bridges *Piledriver (professional wrestling), a move used in professional wrestling Entertainme ...
-based) * (2012+) Nvidia Kepler, Maxwell, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900, 10, 16, 20 series, Quadro K-, M- & P-series, Tesla K-, M- & P-series) * (2012+) Intel 3rd & 4th gen processors ( Ivy Bridge, Haswell) * (2013+) Intel Xeon Phi coprocessors (Knights Corner) * (2013+) Qualcomm Adreno 4xx series * (2013+) ARM Mali Midgard 3rd gen (T760) * (2015+) ARM Mali Midgard 4th gen (T8xx)


OpenCL 1.1 support

* (2008+) some AMD TeraScale 1 GPU's (RV7xx in HD4000-series) * (2008+) Nvidia Tesla, Fermi GPU's (GeForce 8, 9, 100, 200, 300, 400, 500-series, Quadro-series or Tesla-series with Tesla or Fermi GPU) * (2011+) Qualcomm Adreno 3xx series * (2012+) ARM Mali Midgard 1st and 2nd gen (T-6xx, T720)


OpenCL 1.0 support

* mostly updated to 1.1 and 1.2 after first Driver for 1.0 only


Portability, performance and alternatives

A key feature of OpenCL is portability, via its abstracted memory and execution model, and the programmer is not able to directly use hardware-specific technologies such as inline Parallel Thread Execution (PTX) for Nvidia GPUs unless they are willing to give up direct portability on other platforms. It is possible to run any OpenCL kernel on any conformant implementation. However, performance of the kernel is not necessarily portable across platforms. Existing implementations have been shown to be competitive when kernel code is properly tuned, though, and auto-tuning has been suggested as a solution to the performance portability problem, yielding "acceptable levels of performance" in experimental linear algebra kernels. Portability of an entire application containing multiple kernels with differing behaviors was also studied, and shows that portability only required limited tradeoffs. A study at
Delft University The Delft University of Technology (TU Delft; ) is the oldest and largest Dutch public technical university, located in Delft, Netherlands. It specializes in engineering, technology, computing, design, and natural sciences. It is considered one ...
from 2011 that compared
CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
programs and their straightforward translation into OpenCL C found CUDA to outperform OpenCL by at most 30% on the Nvidia implementation. The researchers noted that their comparison could be made fairer by applying manual optimizations to the OpenCL programs, in which case there was "no reason for OpenCL to obtain worse performance than CUDA". The performance differences could mostly be attributed to differences in the programming model (especially the memory model) and to NVIDIA's compiler optimizations for CUDA compared to those for OpenCL. Another study at D-Wave Systems Inc. found that "The OpenCL kernel’s performance is between about 13% and 63% slower, and the end-to-end time is between about 16% and 67% slower" than CUDA's performance. The fact that OpenCL allows workloads to be shared by CPU and GPU, executing the same programs, means that programmers can exploit both by dividing work among the devices. This leads to the problem of deciding how to partition the work, because the relative speeds of operations differ among the devices.
Machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
has been suggested to solve this problem: Grewe and O'Boyle describe a system of
support-vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning, supervised Maximum-margin hyperplane, max-margin models with associated learning algorithms that analyze data for Statistical classification ...
s trained on compile-time features of program that can decide the device partitioning problem statically, without actually running the programs to measure their performance. In a comparison of actual graphic cards of AMD RDNA 2 and Nvidia RTX Series there is an undecided result by OpenCL-Tests. Possible performance increases from the use of Nvidia CUDA or OptiX were not tested.


See also

* Advanced Simulation Library * AMD FireStream * BrookGPU *
C++ AMP C, or c, is the third letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C ...
* Close to Metal *
CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
* DirectCompute *
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
* HIP * Larrabee * Lib Sh * List of OpenCL applications *
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/ GPU systems. As in OpenMP, the prog ...
*
OpenGL OpenGL (Open Graphics Library) is a Language-independent specification, cross-language, cross-platform application programming interface (API) for rendering 2D computer graphics, 2D and 3D computer graphics, 3D vector graphics. The API is typic ...
*
OpenHMPP OpenHMPP (HMPP for Hybrid Multicore Parallel Programming) - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity a ...
*
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
*
Metal A metal () is a material that, when polished or fractured, shows a lustrous appearance, and conducts electrical resistivity and conductivity, electricity and thermal conductivity, heat relatively well. These properties are all associated wit ...
* RenderScript * SequenceL *
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
*
SYCL SYCL (pronounced "sickle") is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language ( eDSL) based on pure C++17. It is a standard develope ...
*
Vulkan Vulkan is a cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and o ...
* WebCL


References


External links

* * for WebCL
International Workshop on OpenCL
( IWOCL) sponsored by The Khronos Group {{Parallel computing 2009 software Application programming interfaces Cross-platform software GPGPU OpenCL Parallel computing Graphics libraries Graphics standards