Single instruction, multiple threads (SIMT) is an execution model used in
parallel computing
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
where
single instruction, multiple data
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(SIMD) is combined with
multithreading. It is different from
SPMD in that all instructions in all "threads" are executed in lock-step. The SIMT execution model has been implemented on several
GPUs and is relevant for
general-purpose computing on graphics processing units (GPGPU), e.g. some
supercomputer
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
s combine CPUs with GPUs.
The processors, say a number of them, seem to execute many more than tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to
SIMD lanes
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
.
The simplest way to understand SIMT is to imagine a multi-core system, where each core has its own register file, its own
ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously broadcast to all SIMT cores from a single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter.
The key difference between SIMT and
SIMD lanes
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
is that each of the SIMT cores may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas SIMD lanes are simply part of an ALU that know nothing about memory per se.
History
SIMT was introduced by
Nvidia in the
Tesla GPU microarchitecture with the G80 chip.
ATI Technologies, now
AMD, released a competing product slightly later on May 14, 2007, the
TeraScale 1-based ''"R600"'' GPU chip.
Description
As access time of all the widespread
RAM types (e.g.
DDR SDRAM,
GDDR SDRAM,
XDR DRAM, etc.) is still relatively high, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs. This might or might not be considered to be a property of 'SIMT' itself.
SIMT is intended to limit
instruction fetching
The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instruction ...
overhead, i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of
Nvidia and
AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to
multithreading in CPUs (not to be confused with
multi-core
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
). As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times.
A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an ''IF''-''ELSE'' block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.
See also
*
General-purpose computing on graphics processing units (GPGPU)
References
{{Graphics Processing Unit
Classes of computers
Computer architecture
GPGPU
Parallel computing
SIMD computing
Threads (computing)