HOME

TheInfoList



OR:

SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets first introduced by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 ser ...
with the initial version of the
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...
in 2000. It extends the earlier SSE instruction set, and is intended to fully replace
MMX MMX may refer to: * 2010, in Roman numerals Science and technology * MMX (instruction set), a single-instruction, multiple-data instruction set designed by Intel * MMX Mineração, a Brazilian mining company * Martian Moons eXploration, a Japane ...
. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
ranges of AMD64 64-bit CPUs in 2003.


Features

Most of the SSE2 instructions implement the integer vector operations also found in MMX. Instead of the MMX registers they use the XMM registers, which are wider and allow for significant performance improvements in specialized applications. Another advantage of replacing MMX with SSE2 is avoiding the mode switching penalty for issuing
x87 x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that worked in tandem with corresponding x86 CPUs. These ...
instructions present in MMX because it is sharing register space with the x87 FPU. The SSE2 also complements the floating-point vector operations of the SSE instruction set by adding support for the double precision data type. Other SSE2 extensions include a set of cache control instructions intended primarily to minimize cache pollution when processing infinite streams of information, and a sophisticated complement of numeric format conversion instructions. AMD's implementation of SSE2 on the AMD64 (
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
) platform includes an additional eight registers, doubling the total number to 16 (XMM0 through XMM15). These additional registers are only visible when running in 64-bit mode. Intel adopted these additional registers as part of their support for x86-64 architecture (or in Intel's parlance, "Intel 64") in 2004.


Differences between x87 FPU and SSE2

FPU (x87) instructions provide higher precision by calculating intermediate results with 80 bits of precision, by default, to minimise
roundoff error A roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are d ...
in numerically unstable algorithms (see IEEE 754 design rationale and references therein). However, the x87 FPU is a scalar unit only whereas SSE2 can process a small vector of operands in parallel. If code designed for x87 is ported to the lower precision double precision SSE2 floating point, certain combinations of math operations or input datasets can result in measurable numerical deviation, which can be an issue in reproducible scientific computations, e.g. if the calculation results must be compared against results generated from a different machine architecture. A related issue is that, historically, language standards and compilers had been inconsistent in their handling of the x87 80-bit registers implementing double extended precision variables, compared with the double and single precision formats implemented in SSE2: the rounding of extended precision intermediate values to double precision variables was not fully defined and was dependent on implementation details such as when registers were spilled to memory.


Differences between MMX and SSE2

SSE2 extends MMX instructions to operate on XMM registers. Therefore, it is possible to convert all existing MMX code to an SSE2 equivalent. Since an SSE2 register is twice as long as an MMX register, loop counters and memory access may need to be changed to accommodate this. However, 8 byte loads and stores to XMM are available, so this is not strictly required. Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly. Two major reasons are: accessing SSE2 data in memory not aligned to a 16-byte boundary can incur significant penalty, and the throughput of SSE2 instructions in older x86 implementations was half that for MMX instructions.
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 ser ...
addressed the first problem by adding an instruction in SSE3 to reduce the overhead of accessing unaligned data and improving the overall performance of misaligned loads, and the last problem by widening the execution engine in their Core microarchitecture in Core 2 Duo and later products. Since MMX and x87 register files alias one another, using MMX will prevent x87 instructions from working as desired. Once MMX has been used, the programmer must use the emms instruction (C: _mm_empty()) to restore operation to the x87 register file. On some operating systems, x87 is not used very much, but may still be used in some critical areas like pow() where the extra precision is needed. In such cases, the corrupt floating-point state caused by failure to emit emms may go undetected for millions of instructions before ultimately causing the floating-point routine to fail, returning NaN. Since the problem is not locally apparent in the MMX code, finding and correcting the bug can be very time consuming. As SSE2 does not have this problem and it usually provides much better throughput and provides more registers in 64-bit code, it should be preferred for nearly all vectorization work.


Compiler usage

When first introduced in 2000, SSE2 was not supported by software development tools. For example, to use SSE2 in a
Microsoft Visual Studio Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such ...
project, the programmer had to either manually write inline-assembly or import object-code from an external source. Later the Visual C++ Processor Pack added SSE2 support to Visual C++ and MASM. The
Intel C++ Compiler Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems. Overview Intel ...
can automatically generate
SSE4 SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; m ...
, SSSE3, SSE3, SSE2, and SSE code without the use of hand-coded assembly. Since GCC 3, GCC can automatically generate SSE/SSE2 scalar code when the target supports those instructions. Automatic vectorization for SSE/SSE2 has been added since GCC 4. The Sun Studio Compiler Suite can also generate SSE2 instructions when the compiler flag -xvector=simd is used. Since
Microsoft Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tri ...
2012, the compiler option to generate SSE2 instructions is turned on by default.


CPU support

SSE2 is an extension of the
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnatio ...
architecture, based on the x86 instruction set. Therefore, only x86 processors can include SSE2. The AMD64 architecture supports the
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnatio ...
as a compatibility mode and includes the SSE2 in its specification. It also doubles the number of XMM registers, allowing for better performance. SSE2 is also a requirement for installing Windows 8 (and later) or Microsoft Office 2013 (and later) "to enhance the reliability of third-party apps and drivers running in Windows 8". The following IA-32 CPUs support SSE2: *
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 ser ...
NetBurst-based CPUs (
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...
,
Xeon Xeon ( ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same ar ...
,
Celeron Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers. Celeron processors are compatible with IA-32 software. They typically offer less performance per clock speed co ...
,
Pentium D Pentium D is a range of desktop 64-bit x86-64 processors based on the NetBurst microarchitecture, which is the dual-core variant of the Pentium 4 manufactured by Intel. Each CPU comprised two dies, each containing a single core, residing next to ...
, Celeron D) * Intel
Pentium M The Pentium M is a family of mobile 32-bit single-core x86 microprocessors (with the modified Intel P6 microarchitecture) introduced in March 2003 and forming a part of the Intel Carmel notebook platform under the then new Centrino brand. The ...
and
Celeron M Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers. Celeron processors are compatible with IA-32 software. They typically offer less performance per clock speed comp ...
* Intel Atom *
AMD Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon#Athlo ...
* Transmeta Efficeon *
VIA C7 The VIA C7 is an x86 central processing unit designed by Centaur Technology and sold by VIA Technologies. Product history The C7 delivers a number of improvements to the older VIA C3 cores but is nearly identical to the latest VIA C3 Nehemiah ...
The following IA-32 CPUs were released after SSE2 was developed, but did not implement it: * AMD CPUs prior to
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
, such as Athlon XP * VIA C3 *
Transmeta Crusoe The Transmeta Crusoe is a family of x86-compatible microprocessors developed by Transmeta and introduced in 2000. Instead of the instruction set architecture being implemented in hardware, or translated by specialized hardware, the Crusoe runs a ...
*
Intel Quark Intel Quark is a line of 32-bit x86 SoCs and microcontrollers by Intel, designed for small size and low power consumption, and targeted at new markets including wearable devices. The line was introduced at Intel Developer Forum in 2013, and ...


See also

* SSE2 instructions


References

{{Multimedia extensions X86 instructions SIMD computing