SIMD
   HOME

TheInfoList



OR:

Single instruction, multiple data (SIMD) is a type of parallel processing in
Flynn's taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but each unit performs the exact same instruction at any given moment (just with different data). SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions to improve the performance of
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradit ...
use. SIMD has three different subcategories in Flynn's 1972 Taxonomy, one of which is SIMT. SIMT should not be confused with software threads or hardware threads, both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution.


History

The first use of SIMD instructions was in the
ILLIAC IV The ILLIAC IV was the first massively parallel computer. The system was originally designed to have 256 64-bit floating point units (FPUs) and four central processing units (CPUs) able to process 1 billion operations per second. Due to budget cons ...
, which was completed in 1966. SIMD was the basis for vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a "vector" of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector processing architectures are now considered separate from SIMD computers: Duncan's Taxonomy includes them where
Flynn's Taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
does not, due to Flynn's work (1966, 1972) pre-dating the Cray-1 (1977). The first era of modern SIMD computers was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These computers had many limited-functionality processors that would work in parallel. For example, each of 65,536 single-bit processors in a Thinking Machines CM-2 would execute the same instruction at the same time, allowing, for instance, to logically combine 65,536 pairs of bits at a time, using a hypercube-connected network or processor-dedicated RAM to find its operands. Supercomputing moved away from the SIMD approach when inexpensive scalar MIMD approaches based on commodity processors such as the Intel i860 XP became more powerful, and interest in SIMD waned. The current era of SIMD processors grew out of the desktop-computer market rather than the supercomputer market. As desktop processors became powerful enough to support real-time gaming and audio/video processing during the 1990s, demand grew for this particular type of computing power, and microprocessor vendors turned to SIMD to meet the demand. Hewlett-Packard introduced MAX instructions into
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
1.1 desktops in 1994 to accelerate MPEG decoding. Sun Microsystems introduced SIMD integer instructions in its " VIS" instruction set extensions in 1995, in its UltraSPARC I microprocessor. MIPS followed suit with their similar MDMX system. The first widely deployed desktop SIMD was with Intel's MMX extensions to the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
architecture in 1996. This sparked the introduction of the much more powerful AltiVec system in the
Motorola Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois, United States. After having lost $4.3 billion from 2007 to 2009, the company split into two independent public companies, Motorol ...
PowerPC and IBM's
POWER Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
systems. Intel responded in 1999 by introducing the all-new SSE system. Since then, there have been several extensions to the SIMD instruction sets for both architectures. Advanced vector extensions AVX, AVX2 and
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; t ...
are developed by Intel. AMD supports AVX and AVX2 in their current products. All of these developments have been oriented toward support for real-time graphics, and are therefore oriented toward processing in two, three, or four dimensions, usually with vector lengths of between two and sixteen words, depending on data type and architecture. When new SIMD architectures need to be distinguished from older ones, the newer architectures are then considered "short-vector" architectures, as earlier SIMD and vector supercomputers had vector lengths from 64 to 64,000. A modern supercomputer is almost always a cluster of MIMD computers, each of which implements (short-vector) SIMD instructions.


Advantages

An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradit ...
applications. One example would be changing the brightness of an image. Each
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the ...
of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory. Audio
DSP DSP may refer to: Computing * Digital signal processing, the mathematical manipulation of an information signal * Digital signal processor, a microprocessor designed for digital signal processing * Yamaha DSP-1, a proprietary digital signal ...
s would likewise, for volume control, multiply both Left and Right channels simultaneously. With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with a traditional CPU design. Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a superscalar processor; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel.


Disadvantages

* Not all algorithms can be vectorized easily. For example, a flow-control-heavy task like code
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
may not easily benefit from SIMD; however, it is theoretically possible to vectorize comparisons and ''"batch flow"'' to target maximal cache optimality, though this technique will require more intermediate state. Note: Batch-pipeline systems (example: GPUs or software rasterization pipelines) are most advantageous for cache control when implemented with SIMD intrinsics, but they are not exclusive to SIMD features. Further complexity may be apparent to avoid dependence within series such as code strings; while independence is required for vectorization. * Large register files which increases power consumption and required chip area. * Currently, implementing an algorithm with SIMD instructions usually requires human labor; most compilers don't generate SIMD instructions from a typical C program, for instance.
Automatic vectorization Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, ...
in compilers is an active area of computer science research. (Compare vector processing.) * Programming with particular SIMD instruction sets can involve numerous low-level challenges. *# SIMD may have restrictions on
data alignment Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing. The CPU in modern computer hardware performs reads and ...
; programmers familiar with one particular architecture may not expect this. Worse: the alignment may change from one revision or "compatible" processor to another. *# Gathering data into SIMD registers and scattering it to the correct destination locations is tricky (sometimes requiring permute operations) and can be inefficient. *# Specific instructions like rotations or three-operand addition are not available in some SIMD instruction sets. *# Instruction sets are architecture-specific: some processors lack SIMD instructions entirely, so programmers must provide non-vectorized implementations (or different vectorized implementations) for them. *# Different architectures provide different register sizes (e.g. 64, 128, 256 and 512 bits) and instruction sets, meaning that programmers must provide multiple implementations of vectorized code to operate optimally on any given CPU. In addition, the possible set of SIMD instructions grows with each new register size. Unfortunately, for legacy support reasons, the older versions cannot be retired. *# The early MMX instruction set shared a register file with the floating-point stack, which caused inefficiencies when mixing floating-point and MMX code. However, SSE2 corrects this. To remedy problems 1 and 5,
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
's vector extension uses an alternative approach: instead of exposing the sub-register-level details to the programmer, the instruction set abstracts them out as a few "vector registers" that use the same interfaces across all CPUs with this instruction set. The hardware handles all alignment issues and "strip-mining" of loops. Machines with different vector sizes would be able to run the same code. LLVM calls this vector type "". An order of magnitude increase in code size is not uncommon, when compared to equivalent scalar or equivalent vector code, and an order of magnitude ''or greater'' effectiveness (work done per instruction) is achievable with Vector ISAs. ARM's
Scalable Vector Extension AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
takes another approach, known in
Flynn's Taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
as "Associative Processing", more commonly known today as "Predicated" (masked) SIMD. This approach is not as compact as Vector processing but is still far better than non-predicated SIMD. Detailed comparative examples are given in the Vector processing page.


Chronology


Hardware

Small-scale (64 or 128 bits) SIMD became popular on general-purpose CPUs in the early 1990s and continued through 1997 and later with Motion Video Instructions (MVI) for Alpha. SIMD instructions can be found, to one degree or another, on most CPUs, including IBM's AltiVec and
SPE SPE may refer to: Science and technology * Spe (planet), 14 Andromedae b * Serum protein electrophoresis * Solar particle event,generating very high energy protons * Solid phase extraction, from a mixture * Epitaxy#Solid-phase, Solid-phase epitax ...
for PowerPC, HP's
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
Multimedia Acceleration eXtensions The Multimedia Acceleration eXtensions or MAX are instruction set extensions to the Hewlett-Packard PA-RISC instruction set architecture (ISA). MAX was developed to improve the performance of multimedia applications that were becoming more prevalen ...
(MAX),
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
's MMX and iwMMXt, SSE, SSE2,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitectu ...
and SSE4.x,
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
's 3DNow!, ARC's ARC Video subsystem,
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
's VIS and VIS2,
Sun The Sun is the star at the center of the Solar System. It is a nearly perfect ball of hot plasma, heated to incandescence by nuclear fusion reactions in its core. The Sun radiates this energy mainly as light, ultraviolet, and infrared radi ...
's MAJC,
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
's Neon technology, MIPS' MDMX (MaDMaX) and MIPS-3D. The IBM, Sony, Toshiba co-developed Cell Processor's SPU's instruction set is heavily SIMD based.
Philips Koninklijke Philips N.V. (), commonly shortened to Philips, is a Dutch multinational conglomerate corporation that was founded in Eindhoven in 1891. Since 1997, it has been mostly headquartered in Amsterdam, though the Benelux headquarters i ...
, now NXP, developed several SIMD processors named Xetal. The Xetal has 320 16-bit processor elements especially designed for vision tasks. Modern
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...
s (GPUs) are often wide SIMD implementations, capable of branches, loads, and stores on 128 or 256 bits at a time. Intel's
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; t ...
SIMD instructions process 512 bits of data at once.


Software

SIMD instructions are widely used to process 3D graphics, although modern graphics cards with embedded SIMD have largely taken over this task from the CPU. Some systems also include permute functions that re-pack elements inside vectors, making them particularly useful for data processing and compression. They are also used in cryptography. The trend of general-purpose computing on GPUs (
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
) may lead to wider use of SIMD in the future. Adoption of SIMD systems in
personal computer A personal computer (PC) is a multi-purpose microcomputer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or tec ...
software was at first slow, due to a number of problems. One was that many of the early SIMD instruction sets tended to slow overall performance of the system due to the re-use of existing floating point registers. Other systems, like MMX and 3DNow!, offered support for data types that were not interesting to a wide audience and had expensive context switching instructions to switch between using the FPU and MMX registers. Compilers also often lacked support, requiring programmers to resort to assembly language coding. SIMD on
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
had a slow start. The introduction of 3DNow! by
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
and SSE by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
confused matters somewhat, but today the system seems to have settled down (after AMD adopted SSE) and newer compilers should result in more SIMD-enabled software. Intel and AMD now both provide optimized math libraries that use SIMD instructions, and open source alternatives like libSIMD, SIMDx86 and SLEEF have started to appear (see also
libm C mathematical operations are a group of functions in the standard library of the C programming language implementing basic mathematical functions. All functions use floating-point numbers in one manner or another. Different C standards provide d ...
). Apple Computer had somewhat more success, even though they entered the SIMD market later than the rest. AltiVec offered a rich system and can be programmed using increasingly sophisticated compilers from
Motorola Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois, United States. After having lost $4.3 billion from 2007 to 2009, the company split into two independent public companies, Motorol ...
, IBM and
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
, therefore assembly language programming is rarely needed. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself, for example iTunes and QuickTime. However, in 2006, Apple computers moved to Intel x86 processors. Apple's
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s and development tools (
XCode Xcode is Apple's integrated development environment (IDE) for macOS, used to develop software for macOS, iOS, iPadOS, watchOS, and tvOS. It was initially released in late 2003; the latest stable release is version 14.2, released on December 13, ...
) were modified to support SSE2 and
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
as well as AltiVec. Apple was the dominant purchaser of PowerPC chips from IBM and
Freescale Semiconductor Freescale Semiconductor, Inc. was an American semiconductor manufacturer. It was created by the divestiture of the Semiconductor Products Sector of Motorola in 2004. Freescale focused their integrated circuit products on the automotive, embe ...
and even though they abandoned the platform, further development of AltiVec is continued in several PowerPC and Power ISA designs from Freescale and IBM. ''SIMD within a register'', or
SWAR SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a processor register. SIMD stands for ''single instruction, multiple data''. Flynn's 1972 taxonomy categorise ...
, is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that doesn't provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly.


Programmer interface

It is common for publishers of the SIMD instruction sets to make their own C/C++ language extensions with
intrinsic function In computer software, in compiler theory, an intrinsic function (or built-in function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it may subst ...
s or special datatypes (with
operator overloading In computer programming, operator overloading, sometimes termed ''operator ad hoc polymorphism'', is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading i ...
) guaranteeing the generation of vector code. Intel, AltiVec, and ARM NEON provide extensions widely adopted by the compilers targeting their CPUs. (More complex operations are the task of vector math libraries.) The
GNU C Compiler The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
takes the extensions a step further by abstracting them into a universal interface that can be used on any platform by providing a way of defining SIMD datatypes. The
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
Clang compiler also implements the feature, with an analogous interface defined in the IR. Rust's packed_simd crate uses this interface, and so does
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
2.0+. C++ has an experimental interface that works similarly to the GCC extension. LLVM's libcxx seems to implement it. For GCC and libstdc++, a wrapper library that builds on top of the GCC extension is available.
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
added SIMD to .NET in RyuJIT. The package, available on NuGet, implement SIMD datatypes. Java also has a new proposed API for SIMD instructions available in
OpenJDK OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GPL-2.0-only w ...
17 in an incubator module. It also has a safe fallback mechanism on unsupported CPUs to simple loops. Instead of providing an SIMD datatype, compilers can also be hinted to auto-vectorize some loops, potentially taking some assertions about the lack of data dependency. This is not as flexible as manipulating SIMD variables directly, but is easier to use.
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syst ...
4.0+ has a hint. This OpenMP interface has replaced a wide set of nonstandard extensions, including
Cilk Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops ...
's ., GCC's , and many more.


SIMD multi-versioning

Consumer software is typically expected to work on a range of CPUs covering multiple generations, which could limit the programmer's ability to use new SIMD instructions to improve the computational performance of a program. The solution is to include multiple versions of the same code that uses either older or newer SIMD technologies, and pick one that best fits the user's CPU at run-time (
dynamic dispatch In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented ...
). There are two main camps of solutions: * Function multi-versioning (FMV): a subroutine in the program or a library is duplicated and compiled for many instruction set extensions, and the program decides which one to use at run-time. * Library multi-versioning (LMV): the entire programming library is duplicated for many instruction set extensions, and the operating system or the program decides which one to load at run-time. FMV, manually coded in assembly language, is quite commonly used in a number of performance-critical libraries such as glibc and libjpeg-turbo. Intel C++ Compiler,
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free softwar ...
since GCC 6, and
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection ...
since clang 7 allow for a simplified approach, with the compiler taking care of function duplication and selection. GCC and clang requires explicit labels in the code to "clone" functions, while ICC does so automatically (under the command-line option ). The
Rust programming language Rust is a multi-paradigm, general-purpose programming language. Rust emphasizes performance, type safety, and concurrency. Rust enforces memory safety—that is, that all references point to valid memory—without requiring the use of a garb ...
also supports FMV. The setup is similar to GCC and Clang in that the code defines what instruction sets to compile for, but cloning is manually done via inlining. As using FMV requires code modification on GCC and Clang, vendors more commonly use library multi-versioning: this is easier to achieve as only compiler switches need to be changed. Glibc supports LMV and this functionality is adopted by the Intel-backed Clear Linux project.


SIMD on the web

In 2013 John McCutchan announced that he had created a high-performance interface to SIMD instruction sets for the Dart programming language, bringing the benefits of SIMD to web programs for the first time. The interface consists of two types: * Float32x4, 4 single precision floating point values. * Int32x4, 4 32-bit integer values. Instances of these types are immutable and in optimized code are mapped directly to SIMD registers. Operations expressed in Dart typically are compiled into a single instruction without any overhead. This is similar to C and C++ intrinsics. Benchmarks for
4×4 Four-wheel drive, also called 4×4 ("four by four") or 4WD, refers to a two-axled vehicle drivetrain capable of providing torque to all of its wheels simultaneously. It may be full-time or on-demand, and is typically linked via a transfer case ...
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
, 3D vertex transformation, and
Mandelbrot set The Mandelbrot set () is the set of complex numbers c for which the function f_c(z)=z^2+c does not diverge to infinity when iterated from z=0, i.e., for which the sequence f_c(0), f_c(f_c(0)), etc., remains bounded in absolute value. This ...
visualization show near 400% speedup compared to scalar code written in Dart. McCutchan's work on Dart, now called SIMD.js, has been adopted by
ECMAScript ECMAScript (; ES) is a JavaScript standard intended to ensure the interoperability of web pages across different browsers. It is standardized by Ecma International in the documenECMA-262 ECMAScript is commonly used for client-side scripting o ...
and Intel announced at IDF 2013 that they are implementing McCutchan's specification for both V8 and
SpiderMonkey SpiderMonkey is the first JavaScript engine, written by Brendan Eich at Netscape Communications, later released as open source and currently maintained by the Mozilla Foundation. It is used in the Firefox web browser. History Eich "wrote ...
. However, by 2017, SIMD.js has been taken out of the ECMAScript standard queue in favor of pursuing a similar interface in WebAssembly. As of August 2020, the WebAssembly interface remains unfinished, but its portable 128-bit SIMD feature has already seen some use in many engines. Emscripten, Mozilla's C/C++-to-JavaScript compiler, with extensions can enable compilation of C++ programs that make use of SIMD intrinsics or GCC-style vector code to the SIMD API of JavaScript, resulting in equivalent speedups compared to scalar code. It also supports (and now prefers) the WebAssembly 128-bit SIMD proposal.


Commercial applications

Though it has generally proven difficult to find sustainable commercial applications for SIMD-only processors, one that has had some measure of success is the
GAPP GapP is a counting complexity class, consisting of all of the functions ''f'' such that there exists a polynomial-time non-deterministic Turing machine ''M'' where, for any input ''x'', ''f(x)'' is equal to the number of accepting paths of ''M'' ...
, which was developed by Lockheed Martin and taken to the commercial sector by their spin-off Teranex. The GAPP's recent incarnations have become a powerful tool in real-time
video processing In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing ...
applications like conversion between various video standards and frame rates (
NTSC The first American standard for analog television broadcast was developed by National Television System Committee (NTSC)National Television System Committee (1951–1953), Report and Reports of Panel No. 11, 11-A, 12–19, with Some supplement ...
to/from
PAL Phase Alternating Line (PAL) is a colour encoding system for analogue television. It was one of three major analogue colour television standards, the others being NTSC and SECAM. In most countries it was broadcast at 625 lines, 50 fields (25 ...
, NTSC to/from
HDTV High-definition television (HD or HDTV) describes a television system which provides a substantially higher image resolution than the previous generation of technologies. The term has been used since 1936; in more recent times, it refers to the g ...
formats, etc.),
deinterlacing Deinterlacing is the process of converting interlaced video into a non-interlaced or Progressive scan, progressive form. Interlaced video signals are commonly found in analog television, digital television (HDTV) when in the 1080i format, some D ...
, image noise reduction, adaptive video compression, and image enhancement. A more ubiquitous application for SIMD is found in
video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This fee ...
s: nearly every modern
video game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to ...
since
1998 1998 was designated as the ''International Year of the Ocean''. Events January * January 6 – The ''Lunar Prospector'' spacecraft is launched into orbit around the Moon, and later finds evidence for frozen water, in soil in permanently s ...
has incorporated a SIMD processor somewhere in its architecture. The PlayStation 2 was unusual in that one of its vector-float units could function as an autonomous
DSP DSP may refer to: Computing * Digital signal processing, the mathematical manipulation of an information signal * Digital signal processor, a microprocessor designed for digital signal processing * Yamaha DSP-1, a proprietary digital signal ...
executing its own instruction stream, or as a coprocessor driven by ordinary CPU instructions. 3D graphics applications tend to lend themselves well to SIMD processing as they rely heavily on operations with 4-dimensional vectors.
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
's Direct3D 9.0 now chooses at runtime processor-specific implementations of its own math operations, including the use of SIMD-capable instructions. One of the recent processors to use vector processing is the Cell Processor developed by IBM in cooperation with
Toshiba , commonly known as Toshiba and stylized as TOSHIBA, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. Its diversified products and services include power, industrial and social infrastructure systems, ...
and
Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professiona ...
. It uses a number of SIMD processors (a
NUMA Nuclear mitotic apparatus protein 1 is a protein that in humans is encoded by the ''NUMA1'' gene. Interactions Nuclear mitotic apparatus protein 1 has been shown to interact with PIM1, Band 4.1, GPSM2 G-protein-signaling modulator 2, also call ...
architecture, each with independent local store and controlled by a general purpose CPU) and is geared towards the huge datasets required by 3D and video processing applications. It differs from traditional ISAs by being SIMD from the ground up with no separate scalar registers. Ziilabs produced an SIMD type processor for use on mobile devices, such as media players and mobile phones. Larger scale commercial SIMD processors are available from ClearSpeed Technology, Ltd. and Stream Processors, Inc. ClearSpeed's CSX600 (2004) has 96 cores each with two double-precision floating point units while the CSX700 (2008) has 192. Stream Processors is headed by computer architect Bill Dally. Their Storm-1 processor (2007) contains 80 SIMD cores controlled by a MIPS CPU.


See also

* Streaming SIMD Extensions, MMX, SSE2,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
,
Advanced Vector Extensions Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bri ...
,
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; t ...
* Instruction set architecture *
Flynn's taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
* SIMD within a register (SWAR) * Single Program, Multiple Data (SPMD) *
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...


References


External links


SIMD architectures (2000)

Cracking Open The Pentium 3 (1999)

Short Vector Extensions in Commercial Microprocessor

Article about Optimizing the Rendering Pipeline of Animated Models Using the Intel Streaming SIMD Extensions

"Yeppp!": cross-platform, open-source SIMD library from Georgia Tech

Introduction to Parallel Computing from LLNL Lawrence Livermore National Laboratory
* : A portable implementation of platform-specific intrinsics for other platforms (e.g. SSE intrinsics for ARM NEON), using C/C++ headers {{DEFAULTSORT:Simd Classes of computers Digital signal processing Flynn's taxonomy Parallel computing de:Flynnsche Klassifikation#SIMD (Single Instruction, Multiple Data)