The VEX prefix (from "vector extensions") and VEX coding scheme are an extension to the

IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...

and

x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set architecture, instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new ope ...

instruction set architecture In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...

for

microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...

s from

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...

AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...

and others.

Features

The VEX coding scheme allows the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes: * The

opcode In computing, an opcode (abbreviated from operation code) is an enumerated value that specifies the operation to be performed. Opcodes are employed in hardware devices such as arithmetic logic units (ALUs), central processing units (CPUs), and ...

map is extended to make space for future instructions. * It allows instruction codes to have up to four operands (plus immediate), where the original scheme allows only two operands (plus immediate). * It allows the size of

SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...

vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...

registers to be extended from the 128- bit XMM registers to the 256-bit YMM registers. There is room for further extensions of the register size. * It allows existing two-operand instructions to be modified into non-destructive three-operand forms where the destination register is different from both source registers. For example, instead of (where register ''a'' is changed by the instruction). The VEX prefix ''replaces'' the most commonly used instruction prefix bytes and escape bytes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code. In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.

SSE Semantic difference

While it is required for 256-bit AVX operations, the VEX prefix simply provides an alternative encoding for 128-bit SSE operations. For the most part, the operation is identical no matter which encoding is used. There is, however, one major difference. SSE operations without VEX leave the high bits of destination SIMD registers unmodified. In particular, a called function written without knowledge of AVX or VEX may save a callee-saved register, use the register, and restore its value, using 128-bit operations, all without disturbing the more-significant bits. This merging of unmodified and newly-computed portions of a register is difficult for the (now-ubiquitous) optimization of

register renaming In computer architecture, register renaming is a technique that abstracts logical processor register, registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instructio ...

, as the unchanged portions of the destination register must be copied to the renamed destination register. x86 processors use special techniques to optimize this (such as the instruction), but it still comes at a performance penalty. When a VEX prefix is used, ''the high bits of the destination register are cleared (zeroed).'' This does not affect the SSE computation at all, but does affect any required save and restore operations.

Instruction encoding

The VEX coding scheme uses a code prefix consisting of two or three

byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...

s, which may be added to existing or new instruction codes. The VEX prefix replaces the , and opcode prefixes, the REX prefix, and the , or opcode prefixes. It may ''not'' be used with one-byte opcodes which do not begin with , nor with the LOCK () prefix. It may be preceded only by address size () or segment (, , , , , ) prefixes. In the x86 architecture, instructions with a memory operand almost always use the ModR/M byte which specifies the

addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...

. This byte has three bit fields: * ''mod'', bits :6- combined with the ''r/m'' field, encodes either 8 registers or 24 addressing modes. Also encodes opcode information for some instructions. * ''reg/opcode'', bits :3- depending on primary opcode byte, specifies either a register or three more bits of opcode information. * ''r/m'', bits :0- can specify a register as an operand, or combine with the ''mod'' field to encode an addressing mode. The base-plus-index and scale-plus-index forms of 32-bit addressing (encoded with r/m = 100 and mod ≠ 11) require another addressing byte, the SIB byte. It has the following fields: * ''scale'' factor, encoded with bits :6* ''index'' register, bits :3* ''base'' register, bits :0 The REX prefix provides additional space for encoding 64-bit addressing modes and additional registers present in the x86-64 architecture. Bit-field W changes the operand size to 64 bits, R expands ''reg'' to 4 bits, B expands ''r/m'' (or ''opreg'' in the few opcodes that encode the register in the 3 lowest opcode bits, such as "POP reg"), and X and B expand ''index'' and ''base'' in the SIB byte. The VEX3 prefix contains all bit-fields from the REX prefix as well as various other prefixes, expanding addressing mode, register enumeration, operand size and width: * R̅, X̅ and B̅ bits are complements of the REX prefix's R, X and B bits; these provide a fourth (high) bit for register index fields (ModRM reg, SIB index, and ModRM r/m; SIB base; or opcode reg fields, respectively) allowing access to 16 instead of 8 registers. * One W bit, equivalent to the REX prefix's W bit, specifies a 64-bit operand; for non-integer instructions, it is a general opcode extension bit. * Four v̅ bits are the complement of an additional source register index. * One L bit indicates the vector length; 0 for 128-bit SSE (XMM) registers, and 1 for 256-bit AVX (YMM) registers. * Two p bits encode additional prefix bytes. The values 0, 1, 2, and 3 correspond to implied no, 0x66, 0xF3, and 0xF2 prefixes. These encode the operand type for SSE floating-point instructions: packed single, packed double, scalar single and scalar double, respectively. * Five ''m'' bits are used to specify ''opcode map'' to use. Of the 32 possible opcode maps that can be encoded with ''m₄m₃m₂m₁m₀'' , opcode maps 1, 2 and 3 are used to provide compact replacements for legacy 2-byte and 3-byte opcodes - these three opcode maps are equivalent to leading escape byte sequences 0x0F, and , respectively. The other VEX opcode maps have seen little use - as of December 2023, the only known uses of other maps are map 0 for the Xeon Phi-specific JKZD/JKNZD instructions and map 7 for the planned URDMSR/UWRMSR instructions. Maps 4/5/6 are used with the EVEX prefix, but none of the instructions in those maps are VEX-encodable. The VEX2 prefix is a 2-byte abbreviation of the VEX3 prefix, which may be used when the omitted fields have the following values: * W = 0: 32-bit operand size * B̅ = 1 (B = 0): Base register is among the first 8 * X̅ = 1 (X = 0): Index register (if a SIB byte is present) is among the first 8 * m = 00001: 2-byte opcode beginning with 0x0F Instructions which require different values for these fields must be encoded with the VEX3 prefix. VEX2 does include an R̅ bit, an L bit, two p bits, and an additional 4-bit source register (v), so is useful for many SSE and AVX instructions as long as the register/memory operand uses only the first 8 registers. The REX2 prefix is a 2-byte variant of the REX prefix, introduced with Intel APX extensions which add 16 Extended GPR registers, for a total of 32. * R₃, X₃, and B₃ bits are the same as R, X and B bits in the REX prefix. * R₄, X₄, and B₄ bits are additional bits used to encode the 32 EGPR registers. * W bit is the same as in the REX prefix. * M₀ bit selects between legacy map 0 (1-byte opcodes, no escape) and legacy map 1 (2-byte opcodes, escape 0x0F).

Technical description

Instructions coded with the VEX prefix can have up to four variable operands (in registers or memory) and one constant operand (immediate value). Instructions that need more than three variable operands use immediate operand bits to specify a 4th register operand (IS4 above). At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers. The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instruction set uses VEX prefix only for instructions using the SIMD XMM registers. However, the VEX coding scheme has been used for other instruction types as well in subsequent expansions of the instruction set. For example: * BMI introduced VEX-coded arithmetic and bit manipulation instructions that operate on general purpose registers. * AVX-512 introduced 8 mask registers and added VEX-coded instructions to manipulate them. (VEX.B̅ is ignored when the field is used to encode a mask register, but VEX.R̅ and VEX.v̅₃ are not, and must be set to 1 in 64-bit mode.) * AMX introduced 8 tile registers and added VEX-coded instructions to manipulate them. The VEX prefix's initial-byte values, 0xC4 and 0xC5, are the same as the opcodes of the LDS and LES instructions. Not supported in 64-bit mode, the ambiguity is resolved in 32-bit mode by exploiting the fact that a legal LDS or LES's ModR/M byte cannot specify a register source operand; i.e., be of the form ''11xxxxxx''. Various bit-fields in the VEX prefix's second byte are inverted to ensure that the byte is always of this form. Similarly, the REX prefix's one-byte form has the four high-order bits set to four, which replaces sixteen opcodes numbered 0x40–0x4F. Previously, those opcodes were individual INC and DEC instructions for the eight standard processor registers; x86-64 code must use ModR/M INC and DEC instructions. Legacy SIMD instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences: * The VEX-encoded instruction can have one more operand, making it non-destructive. * A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero. * 128-bit XMM instructions without VEX prefix usually require any memory arguments to be 16-byte aligned - VEX-encoded versions allow misaligned memory operands. Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency. The VEX prefix is not supported in

real mode Real mode, also called real address mode, is an operating mode of all x86-compatible CPUs. The mode gets its name from the fact that addresses in real mode always correspond to real locations in memory. Real mode is characterized by a 20- bit s ...

and virtual-8086 mode (all instructions with the VEX prefix will cause #UD in these modes).

History

* In August 2007,

proposed the SSE5 instruction set extension which includes a new coding scheme for instructions with three operands, using an extra byte named DREX, and intended for the

Bulldozer A bulldozer or dozer (also called a crawler) is a large tractor equipped with a metal #Blade, blade at the front for pushing material (soil, sand, snow, rubble, or rock) during construction work. It travels most commonly on continuous tracks, ...

processor core in 2011. However, in 2009, SSE5 was canceled and never implemented. * In March 2008, Intel proposed the AVX instruction set, using the new VEX coding scheme. * In August 2008, commentators deplored the expected incompatibility between AMD and Intel instruction sets, and proposed that AMD revise their plans and replace the DREX scheme with the more flexible and extensible VEX scheme. * In May 2009, AMD announced a revision of the proposed SSE5 instruction set to make it compatible with the AVX instruction set and the VEX coding scheme. The revised SSE5 is called XOP. * January 2011. The AVX instruction set is supported in Intel's Sandy Bridge microprocessor architecture. * 2011. The AVX, XOP and FMA4 instruction sets, are supported in the AMD

processor. * 2013. The

FMA3 The FMA instruction set is an extension to the 128- and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants: * FMA4 is supported in AM ...

instruction set is supported in Intel Haswell processors. * In July 2023, Intel announced Advanced Performance Extensions (APX) which use REX2 prefix and updated EVEX prefix.

References

{{DEFAULTSORT:Vex Prefix X86 instructions SIMD computing