Bit manipulation instructions sets (BMI sets) are extensions to the
x86 instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
for
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
s from
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
and
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...
. The purpose of these instruction sets is to improve the speed of
bit manipulation
Bit manipulation is the act of algorithmically manipulating bits or other pieces of data shorter than a word. Computer programming tasks that require bit manipulation include low-level device control, error detection and correction algori ...
. All the instructions in these sets are non-
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
and operate only on general-purpose
registers.
There are two sets published by Intel: BMI (now referred to as BMI1) and BMI2; they were both introduced with the
Haswell microarchitecture with BMI1 matching features offered by AMD's ABM instruction set and BMI2 extending them. Another two sets were published by AMD: ABM (''Advanced Bit Manipulation'', which is also a subset of
SSE4a
SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more ...
implemented by Intel as part of
SSE4.2
SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more ...
and BMI1), and TBM (''Trailing Bit Manipulation'', an extension introduced with
Piledriver-based processors as an extension to BMI1, but dropped again in
Zen
Zen ( zh, t=禪, p=Chán; ja, text= 禅, translit=zen; ko, text=선, translit=Seon; vi, text=Thiền) is a school of Mahayana Buddhism that originated in China during the Tang dynasty, known as the Chan School (''Chánzong'' 禪宗), and ...
-based processors).
ABM (Advanced Bit Manipulation)
AMD was the first to introduce the instructions that now form Intel's BMI1 as part of its ABM (''Advanced Bit Manipulation'') instruction set, then later added support for Intel's new BMI2 instructions. AMD today advertises the availability of these features via Intel's BMI1 and BMI2 cpuflags and instructs programmers to target them accordingly.
While Intel considers
POPCNT
as part of SSE4.2 and
LZCNT
as part of BMI1, both Intel and AMD advertise the presence of these two instructions individually.
POPCNT
has a separate
CPUID
In the x86 architecture, the CPUID instruction (identified by a CPUID opcode) is a processor supplementary instruction (its name derived from CPU IDentification) allowing software to discover details of the processor. It was introduced by Intel ...
flag of the same name, and Intel and AMD use AMD's
ABM
flag to indicate
LZCNT
support (since
LZCNT
combined with BMI1 and BMI2 completes the expanded ABM instruction set).
LZCNT
is related to the Bit Scan Reverse (
BSR
) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). Also, it produces a defined result (the source operand size in bits) if the source operand is zero. For a non-zero argument, sum of
LZCNT
and
BSR
results is argument bit width minus 1 (for example, if 32-bit argument is
0x000f0000
, LZCNT gives 12, and BSR gives 19).
The encoding of
LZCNT
is such that if ABM is not supported, then the
BSR
instruction is executed instead.
BMI1 (Bit Manipulation Instruction Set 1)
The instructions below are those enabled by the
BMI
bit in CPUID. Intel officially considers
LZCNT
as part of BMI, but advertises
LZCNT
support using the
ABM
CPUID feature flag.
BMI1 is available in AMD's
Jaguar
The jaguar (''Panthera onca'') is a large cat species and the only living member of the genus ''Panthera'' native to the Americas. With a body length of up to and a weight of up to , it is the largest cat species in the Americas and the thi ...
,
Piledriver and newer processors, and in Intel's
Haswell and newer processors.
TZCNT
is almost identical to the Bit Scan Forward (
BSF
) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). For a non-zero argument, the result of
TZCNT
and
BSF
is equal.
As with
LZCNT
, the encoding of
TZCNT
is such that if BMI1 is not supported, then the
BSF
instruction is executed instead.
BMI2 (Bit Manipulation Instruction Set 2)
Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting BMI1 without BMI2; BMI2 is supported by AMDs
Excavator
Excavators are heavy construction equipment consisting of a boom, dipper (or stick), bucket and cab on a rotating platform known as the "house". The house sits atop an undercarriage with tracks or wheels. They are a natural progression fr ...
architecture and newer.
Parallel bit deposit and extract
The
PDEP
and
PEXT
instructions are new generalized bit-level compress and expand instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked.
PEXT
copies selected bits from the source to contiguous low-order bits of the destination; higher-order destination bits are cleared.
PDEP
does the opposite for the selected bits: contiguous low-order bits are copied to selected bits of the destination; other destination bits are cleared. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. While what these instructions do is similar to bit level
gather-scatter
Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse linear algebra operations, sorting algorithms, fast Fourier transfor ...
SIMD instructions,
PDEP
and
PEXT
instructions (like the rest of the BMI instruction sets) operate on general-purpose registers.
The instructions are available in 32-bit and 64-bit versions. An example using arbitrary source and selector in 32-bit mode is:
AMD processors before Zen 3 that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles rather than (Zen 3) 3 cycles. As a result it is often faster to use other instructions on these processors.
TBM (Trailing Bit Manipulation)
TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its
Piledriver line of processors; later AMD Jaguar and Zen-based processors do not support TBM.
No Intel processors (at least through
Alder Lake
Alder Lake is Intel's codename for the 12th generation of Intel Core processors based on a hybrid architecture utilizing Golden Cove performance cores and Gracemont efficient cores. It is fabricated using Intel's Intel 7 process, previously ...
) support TBM.
Supporting CPUs
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
** Intel
Nehalem processors and newer (like
Sandy Bridge
Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors ( Core i7, i5, i3). The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture ...
,
Ivy Bridge) (POPCNT supported)
** Intel
Silvermont
Silvermont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel. Silvermont forms the basis for a total of four SoC families:
* ''Merrifield'' and ''Moorefield'' cons ...
processors (POPCNT supported)
** Intel
Haswell processors and newer (like
Skylake,
Broadwell) (ABM, BMI1 and BMI2 supported)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...
**
K10-based processors (ABM supported)
** "Cat" low-power processors
***
Bobcat-based processors (ABM supported)
***
Jaguar-based processors and newer (ABM and BMI1 supported)
***
Puma-based processors and newer (ABM and BMI1 supported)
** "Heavy Equipment" processors
***
Bulldozer-based processors (ABM supported)
***
Piledriver-based processors (ABM, BMI1 and TBM supported)
***
Steamroller-based processors (ABM, BMI1 and TBM supported)
***
Excavator-based processors and newer (ABM, BMI1, BMI2 and TBM supported; microcoded PEXT and PDEP)
**
Zen-based,
Zen+-based, and
Zen 2-based processors (ABM, BMI1 and BMI2 supported; microcoded PEXT and PDEP)
**
Zen 3
Zen 3 is the codename for a CPU microarchitecture by AMD, released on November 5, 2020. It is the successor to Zen 2 and uses TSMC's 7 nm process for the chiplets and GlobalFoundries's 14 nm process for the I/O die on the server chips and 12 nm f ...
processors and newer (ABM, BMI1 and BMI2 supported; full hardware implementation)
Note that instruction extension support means the processor is capable of executing the supported instructions for software compatibility purposes. The processor might not perform well doing so. For example, Excavator through Zen 2 processors implement PEXT and PDEP instructions using microcode resulting in the instructions executing significantly slower than the same behaviour recreated using other instructions. (A software method called "zp7" is, in fact, faster on these machines.)
For optimum performance it is recommended that compiler developers choose to use individual instructions in the extensions based on architecture specific performance profiles rather than on extension availability.
See also
*
Advanced Vector Extensions
Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridg ...
(AVX)
*
AES instruction set
An Advanced Encryption Standard instruction set is now integrated into many processors. The purpose of the instruction set is to improve the speed and security of applications performing encryption and decryption using Advanced Encryption Standard ...
*
CLMUL instruction set Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. Mathema ...
*
F16C
The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats.
History
Th ...
*
FMA instruction set
The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations."FMA3 and FMA4 are not instruction sets, they are ind ...
*
Intel ADX Intel ADX (Multi-Precision Add-Carry Instruction Extensions) is Intel's arbitrary-precision arithmetic extension to the x86 instruction set architecture (ISA). Intel ADX was first supported in the Broadwell microarchitecture.XOP instruction set
The XOP (''eXtended Operations'') instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 20 ...
*
Intel BCD opcodes
The Intel BCD opcodes are a set of six x86 instructions that operate with binary-coded decimal numbers. The radix used for the representation of numbers in the x86 processors is 2. This is called a binary numeral system. However, the x86 proc ...
(also used for advanced bit manipulation techniques)
References
Further reading
*
External links
Intel Intrinsics Guide
{{Multimedia extensions
X86 instructions
AMD technologies