Single instruction, multiple threads (SIMT) is an execution model used in
parallel computing where
single instruction, multiple data
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(SIMD) is combined with
multithreading. It is different from
SPMD
In computing, single program, multiple data (SPMD) is a technique employed to achieve parallelism; it is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results fas ...
in that all instructions in all "threads" are executed in lock-step. The SIMT execution model has been implemented on several
GPUs and is relevant for
general-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU), e.g. some
supercomputers combine CPUs with GPUs.
The processors, say a number of them, seem to execute many more than tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to
SIMD lanes.
The simplest way to understand SIMT is to imagine a multi-core system, where each core has its own register file, its own
ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously broadcast to all SIMT cores from a single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter.
The key difference between SIMT and
SIMD lanes is that each of the SIMT cores may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas SIMD lanes are simply part of an ALU that know nothing about memory per se.
History
SIMT was introduced by
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
in the
Tesla GPU microarchitecture with the G80 chip.
ATI Technologies
ATI Technologies Inc. (commonly called ATI) was a Canadian semiconductor technology corporation based in Markham, Ontario, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technology Inc., ...
, now
AMD, released a competing product slightly later on May 14, 2007, the
TeraScale 1-based ''"R600"'' GPU chip.
Description
As access time of all the widespread
RAM
Ram, ram, or RAM may refer to:
Animals
* A male sheep
* Ram cichlid, a freshwater tropical fish
People
* Ram (given name)
* Ram (surname)
* Ram (director) (Ramsubramaniam), an Indian Tamil film director
* RAM (musician) (born 1974), Dutch
...
types (e.g.
DDR SDRAM
Double Data Rate Synchronous Dynamic Random-Access Memory (DDR SDRAM) is a double data rate (DDR) synchronous dynamic random-access memory (SDRAM) class of memory integrated circuits used in computers. DDR SDRAM, also retroactively called D ...
,
GDDR SDRAM
Graphics DDR SDRAM (GDDR SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) specifically designed for applications requiring high bandwidth, e.g. graphics processing units (GPUs). GDDR SDRAM is distinct from the more widely kno ...
,
XDR DRAM, etc.) is still relatively high, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs. This might or might not be considered to be a property of 'SIMT' itself.
SIMT is intended to limit
instruction fetching overhead, i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and
AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to
multithreading in CPUs (not to be confused with
multi-core
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
). As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times.
A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an ''IF''-''ELSE'' block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.
See also
*
General-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU)
References
{{Graphics Processing Unit
Classes of computers
Computer architecture
GPGPU
Parallel computing
SIMD computing
Threads (computing)