HOME

TheInfoList



OR:

SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a
white paper A white paper is a report or guide that informs readers concisely about a complex issue and presents the issuing body's philosophy on the matter. It is meant to help readers understand an issue, solve a problem, or make a decision. Since the 199 ...
;Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Innovation
, Intel.
more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in
Beijing Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
, in the presentation. SSE4 extended the SSE3 instruction set which was released in early 2004. All software using previous Intel SIMD instructions (ex. SSE3) are compatible with modern microprocessors supporting SSE4 instructions. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4. Like other previous generation CPU SIMD instruction sets, SSE4 supports up to 16 registers, each 128-bits wide which can load four 32-bit integers, four 32-bit single precision floating point numbers, or two 64-bit double precision floating point numbers. SIMD operations, such as vector element-wise addition/multiplication and vector scalar addition/multiplication, process multiple bytes of data in a single CPU instruction. The parallel operation packs noticeable increases in performance. SSE4.2 introduced new SIMD string operations, including an instruction to compare two string fragments of up to 16 bytes each. SSE4.2 is a subset of SSE4 and it was released a few years after the initial release of SSE4.


SSE4 subsets

Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as ''SSE4.1'' in some Intel documentation, is available in Penryn. Additionally, ''SSE4.2'', a second subset consisting of the seven remaining instructions, is first available in Nehalem-based Core i7. Intel credits feedback from developers as playing an important role in the development of the instruction set. Starting with
Barcelona Barcelona ( ; ; ) is a city on the northeastern coast of Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second-most populous municipality of Spain. With a population of 1.6 million within c ...
-based processors, AMD introduced the ''SSE4a'' instruction set, which has four SSE4 instructions and four new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 (the full SSE4 instruction set) in the Bulldozer-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory. Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until AVX.


Name confusion

What is now known as SSSE3 (Supplemental Streaming SIMD Extensions 3), introduced in the Intel Core 2 processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists.My Experience With "Conroe"
, DailyTech
Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.Extending the World's Most Popular Processor Architecture
, Intel
Intel is using the marketing term ''HD Boost'' to refer to SSE4.


New instructions

Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand. Several of these instructions are enabled by the single-cycle shuffle engine in Penryn. (Shuffle operations reorder bytes within a register.)


SSE4.1

These instructions were introduced with Penryn microarchitecture, the 45 nm shrink of Intel's Core microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE41 it 19flag.


SSE4.2

SSE4.2 added STTNI (String and Text New Instructions), several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of XML documents. It also added a CRC32 instruction to compute cyclic redundancy checks as used in certain data transfer protocols. These instructions were first implemented in the Nehalem-based Intel Core i7 product line, and complete the SSE4 instruction set. AMD on the other hand first added support starting with the Bulldozer microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE42 it 20flag. Windows 11 24H2 requires the CPU to support SSE4.2, otherwise the Windows kernel is unbootable.


POPCNT and LZCNT

These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel implements POPCNT beginning with the Nehalem microarchitecture and LZCNT beginning with the Haswell microarchitecture. AMD implements both, beginning with the Barcelona microarchitecture. AMD calls this pair of instructions ''Advanced Bit Manipulation'' (ABM). The encoding of LZCNT takes the same encoding path as the encoding of the BSR (bit scan reverse) instruction. This results in an issue where LZCNT called on some CPUs not supporting it, such as Intel CPUs prior to Haswell, may incorrectly execute the BSR operation instead of raising an ''invalid instruction'' exception. This is an issue as the result values of LZCNT and BSR are different. Trailing zeros can be counted using the BSF (bit scan forward) or TZCNT instructions. Windows 11 24H2 requires the CPU to support POPCNT, otherwise the Windows kernel is unbootable.


SSE4a

The SSE4a instruction group was introduced in AMD's Barcelona microarchitecture. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A it 6flag.


Supporting CPUs


X86-64 v1

*
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Penryn processors (SSE4.1 supported, except Pentium Dual-Core and Celeron) * AMD ** Bobcat-based processors (SSE4a, POPCNT and LZCNT supported) ** K10-based processors (SSE4a, POPCNT and LZCNT supported) * VIA ** Nano 3000, X2, QuadCore processors (SSE4.1 supported)


X86-64 v2

*
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Low-power processors (SSE4.1, SSE4.2 and POPCNT supported) *** Silvermont processors *** Goldmont processors *** Goldmont Plus processors *** Tremont processors ** Nehalem and Westmere processors (SSE4.1, SSE4.2 and POPCNT supported, except Pentium and Celeron) ** Sandy Bridge and Ivy Bridge processors (SSE4.1, SSE4.2 and POPCNT supported, include Pentium and Celeron) * AMD ** "Cat" low-power processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported) *** Jaguar-based processors *** Puma-based processors ** "Heavy Equipment" processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported) *** Bulldozer-based processors *** Piledriver-based processors *** Steamroller-based processors * VIA ** Nano QuadCore C4000-series processors (SSE4.1, SSE4.2 supported) ** Eden X4 processors (SSE4.1, SSE4.2 supported) * Zhaoxin ** ZX-C processors and newer (SSE4.1, SSE4.2 supported)


X86-64 v3

*
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Gracemont processors (SSE4.1, SSE4.2, POPCNT and LZCNT supported) ** Haswell and Broadwell processors (SSE4.1, SSE4.2, POPCNT and LZCNT supported) * AMD ** Excavator-based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported) **
Zen Zen (; from Chinese: ''Chán''; in Korean: ''Sŏn'', and Vietnamese: ''Thiền'') is a Mahayana Buddhist tradition that developed in China during the Tang dynasty by blending Indian Mahayana Buddhism, particularly Yogacara and Madhyamaka phil ...
, Zen+, Zen 2, and Zen 3 based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)


X86-64 v4

*
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Skylake processors and newer (SSE4.1, SSE4.2, POPCNT and LZCNT supported) * AMD ** Zen4-based processors and newer (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)


References


External links


SSE4 Programming Reference
by Intel

archived a
Ghostarchive.org
at May 10, 2022 {{Multimedia extensions X86 instructions SIMD computing