SSE4 (Streaming SIMD Extensions 4) is a
SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
CPU
instruction set
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
used in the
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Core microarchitecture and
AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006
Intel Developer Forum
The Intel Developer Forum (IDF) was a biannual gathering of technologists to discuss Intel products and products based on Intel products. The first IDF was held in 1997.
To emphasize the importance of China, the Spring 2007 IDF was held in Beijin ...
, with vague details in a
white paper
A white paper is a report or guide that informs readers concisely about a complex issue and presents the issuing body's philosophy on the matter. It is meant to help readers understand an issue, solve a problem, or make a decision. Since the 199 ...
;
[Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Innovation](_blank)
, Intel. more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in
Beijing
Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
, in the presentation. SSE4 extended the
SSE3
SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
instruction set which was released in early 2004. All software using previous Intel SIMD instructions (ex. SSE3) are compatible with modern microprocessors supporting SSE4 instructions. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.
Like other previous generation CPU SIMD instruction sets, SSE4 supports up to 16 registers, each 128-bits wide which can load four 32-bit integers, four 32-bit single precision floating point numbers, or two 64-bit double precision floating point numbers.
SIMD operations, such as vector element-wise addition/multiplication and vector scalar addition/multiplication, process multiple bytes of data in a single CPU instruction. The parallel operation packs noticeable increases in performance. SSE4.2 introduced new SIMD string operations, including an instruction to compare two string fragments of up to 16 bytes each.
SSE4.2 is a subset of SSE4 and it was released a few years after the initial release of SSE4.
SSE4 subsets
Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as ''SSE4.1'' in some Intel documentation, is available in
Penryn. Additionally, ''SSE4.2'', a second subset consisting of the seven remaining instructions, is first available in
Nehalem-based
Core i7
Intel Core is a line of multi-core (with the exception of Core Solo and Core 2 Solo) central processing units (CPUs) for midrange, embedded, workstation, high-end and enthusiast computer markets marketed by Intel Corporation. These processors ...
. Intel credits feedback from developers as playing an important role in the development of the instruction set.
Starting with
Barcelona
Barcelona ( ; ; ) is a city on the northeastern coast of Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second-most populous municipality of Spain. With a population of 1.6 million within c ...
-based processors,
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
introduced the ''SSE4a'' instruction set, which has four SSE4 instructions and four new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 (the full SSE4 instruction set) in the
Bulldozer
A bulldozer or dozer (also called a crawler) is a large tractor equipped with a metal #Blade, blade at the front for pushing material (soil, sand, snow, rubble, or rock) during construction work. It travels most commonly on continuous tracks, ...
-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory. Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until
AVX.
Name confusion
What is now known as
SSSE3
Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.
History
SSSE3 was first introduced with Intel processors based on the Core microarchitect ...
(Supplemental Streaming
SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
Extensions 3), introduced in the
Intel Core 2
Intel Core 2 is a processor family encompassing a range of Intel's mainstream 64-bit x86-64 single-, dual-, and quad-core microprocessors based on the Core microarchitecture. The single- and dual-core models are single- die, whereas the quad-co ...
processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists.
[My Experience With "Conroe"](_blank)
, DailyTech Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.
[Extending the World's Most Popular Processor Architecture](_blank)
, Intel
Intel is using the marketing term ''HD Boost'' to refer to SSE4.
New instructions
Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand.
Several of these instructions are enabled by the single-cycle shuffle engine in Penryn. (Shuffle operations reorder bytes within a register.)
SSE4.1
These instructions were introduced with
Penryn microarchitecture, the 45 nm shrink of Intel's
Core microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE41
it 19flag.
SSE4.2
SSE4.2 added STTNI (String and Text New Instructions), several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
documents. It also added a
CRC32
instruction to compute
cyclic redundancy check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...
s as used in certain data transfer protocols. These instructions were first implemented in the
Nehalem-based
Intel Core i7 product line, and complete the SSE4 instruction set. AMD on the other hand first added support starting with the
Bulldozer microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE42
it 20flag.
Windows 11 24H2 requires the CPU to support SSE4.2, otherwise the Windows kernel is unbootable.
POPCNT
and LZCNT
These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel implements
POPCNT
beginning with the
Nehalem microarchitecture and
LZCNT
beginning with the
Haswell microarchitecture. AMD implements both, beginning with the
Barcelona microarchitecture.
AMD calls this pair of instructions
''Advanced Bit Manipulation'' (ABM).
The encoding of
LZCNT
takes the same encoding path as the encoding of the
BSR
(bit scan reverse) instruction. This results in an issue where
LZCNT
called on some CPUs not supporting it, such as Intel CPUs prior to Haswell, may incorrectly execute the
BSR
operation instead of raising an ''invalid instruction'' exception. This is an issue as the result values of
LZCNT
and
BSR
are different.
Trailing zeros can be counted using the
BSF
(bit scan forward) or
TZCNT
instructions.
Windows 11 24H2 requires the CPU to support
POPCNT
, otherwise the Windows kernel is unbootable.
SSE4a
The SSE4a instruction group was introduced in AMD's
Barcelona microarchitecture. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A
it 6flag.
Supporting CPUs
X86-64 v1
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
**
Penryn processors (SSE4.1 supported, except
Pentium Dual-Core
The Pentium Dual-Core brand was used for mainstream x86-architecture microprocessors from Intel from 2006 to 2009, when it was renamed to Pentium. The processors are based on either the 32-bit '' Yonah'' or (with quite different microarchitectu ...
and
Celeron
Celeron is a series of IA-32 and x86-64 computer microprocessor, microprocessors targeted at low-cost Personal computer, personal computers, manufactured by Intel from 1998 until 2023.
The first Celeron-branded CPU was introduced on April 15, ...
)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
**
Bobcat-based processors (SSE4a,
POPCNT
and
LZCNT
supported)
**
K10-based processors (SSE4a,
POPCNT
and
LZCNT
supported)
*
VIA
**
Nano 3000, X2, QuadCore processors (SSE4.1 supported)
X86-64 v2
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Low-power processors (SSE4.1, SSE4.2 and
POPCNT
supported)
***
Silvermont processors
***
Goldmont
Goldmont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel. They allow only one thread per core.
The ''Apollo Lake'' platform with 14 nm Goldmont core was unve ...
processors
***
Goldmont Plus
Goldmont Plus is a microarchitecture for low-power Celeron and Pentium Silver branded processors used in systems on a chip (SoCs) made by Intel. The ''Gemini Lake'' platform with 14 nm Goldmont Plus core was officially launched on Decemb ...
processors
***
Tremont processors
**
Nehalem and
Westmere processors (SSE4.1, SSE4.2 and
POPCNT
supported, except
Pentium
Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...
and
Celeron
Celeron is a series of IA-32 and x86-64 computer microprocessor, microprocessors targeted at low-cost Personal computer, personal computers, manufactured by Intel from 1998 until 2023.
The first Celeron-branded CPU was introduced on April 15, ...
)
**
Sandy Bridge
Sandy Bridge is the List of Intel codenames, codename for Intel's 32 nm process, 32 nm microarchitecture used in the second generation of the Intel Core, Intel Core processors (Intel Core i7, Core i7, Intel Core i5, i5, Intel Core i3, i3). The Sa ...
and
Ivy Bridge processors (SSE4.1, SSE4.2 and
POPCNT
supported, include
Pentium
Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...
and
Celeron
Celeron is a series of IA-32 and x86-64 computer microprocessor, microprocessors targeted at low-cost Personal computer, personal computers, manufactured by Intel from 1998 until 2023.
The first Celeron-branded CPU was introduced on April 15, ...
)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
** "Cat" low-power processors (SSE4a, SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
***
Jaguar-based processors
***
Puma-based processors
** "Heavy Equipment" processors (SSE4a, SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
***
Bulldozer-based processors
***
Piledriver-based processors
***
Steamroller-based processors
*
VIA
**
Nano QuadCore C4000-series processors (SSE4.1, SSE4.2 supported)
**
Eden X4 processors (SSE4.1, SSE4.2 supported)
*
Zhaoxin
Zhaoxin (Shanghai Zhaoxin Semiconductor Co., Ltd.; , ) is a fabless semiconductor company, created in 2013 as a joint venture between VIA Technologies and the Shanghai Municipal Government. The company manufactures x86-compatible desktop and ...
** ZX-C processors and newer (SSE4.1, SSE4.2 supported)
X86-64 v3
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
**
Gracemont processors (SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
**
Haswell and
Broadwell processors (SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
**
Excavator-based processors (SSE4a, SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
**
Zen
Zen (; from Chinese: ''Chán''; in Korean: ''Sŏn'', and Vietnamese: ''Thiền'') is a Mahayana Buddhist tradition that developed in China during the Tang dynasty by blending Indian Mahayana Buddhism, particularly Yogacara and Madhyamaka phil ...
,
Zen+
Zen+ is the name for a computer processor microarchitecture by AMD. It is the successor to the first gen Zen microarchitecture, and was first released in April 2018, powering the second generation of Ryzen processors, known as Ryzen 2000 for mai ...
,
Zen 2
Zen 2 is a computer processor microarchitecture by AMD. It is the successor of AMD's Zen and Zen+ microarchitectures, and is fabricated on the 7 nm MOSFET node from TSMC. The microarchitecture powers the third generation of Ryzen processors, kn ...
, and
Zen 3
Zen 3 is the name for a CPU microarchitecture by AMD, released on November 5, 2020. It is the successor to Zen 2 and uses TSMC's 7 nm process, 7 nm process for the chiplets and GlobalFoundries's 14 nm process, 14 nm process for the I/O die on th ...
based processors (SSE4a, SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
X86-64 v4
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
**
Skylake Skylake or Sky Lake may refer to:
* Skylake (microarchitecture)
Skylake is Intel's codename for its sixth generation Core microprocessor family that was launched on August 5, 2015, succeeding the Broadwell microarchitecture. Skylake is a mic ...
processors and newer (SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
**
Zen4-based processors and newer (SSE4a, SSE4.1, SSE4.2,
POPCNT
and
LZCNT
supported)
References
External links
SSE4 Programming Referenceby Intel
archived a
Ghostarchive.orgat May 10, 2022
{{Multimedia extensions
X86 instructions
SIMD computing