
In
computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
central processing unit
A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s, micro-operations (also known as micro-ops or μops, historically also as micro-actions
) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed
macro-instructions in this context).
Usually, micro-operations perform basic operations on data stored in one or more
registers, including transferring data between registers or between registers and external
buses
A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a motor vehicle that carries significantly more passengers than an average car or van, but fewer than the average rail transport. It is most commonly used ...
of the
central processing unit
A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
(CPU), and performing arithmetic or logical operations on registers. In a typical
fetch-decode-execute cycle The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch–execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions ...
, each step of a macro-instruction is decomposed during its execution so the CPU determines and steps through a series of micro-operations. The execution of micro-operations is performed under control of the CPU's
control unit
The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the op ...
, which decides on their execution while performing various optimizations such as reordering, fusion and caching.
Optimizations
Various forms of μops have long been the basis for traditional
microcode
In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. It consists of a set of hardware-level instructions ...
routines used to simplify the implementation of a particular
CPU design
Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor (computing), processor, a key component of computer hardware.
The design process involves choosing an instruction set an ...
or perhaps just the sequencing of certain multi-step operations or addressing modes. More recently, μops have also been employed in a different way in order to let modern
CISC processors more easily handle asynchronous parallel and speculative execution: As with traditional microcode, one or more table lookups (or equivalent) is done to locate the appropriate μop-sequence based on the encoding and semantics of the machine instruction (the decoding or translation step), however, instead of having rigid μop-sequences controlling the CPU directly from a microcode-
ROM
Rom, or ROM may refer to:
Biomechanics and medicine
* Risk of mortality, a medical classification to estimate the likelihood of death for a patient
* Rupture of membranes, a term used during pregnancy to describe a rupture of the amniotic sac
* ...
, μops are here dynamically buffered for rescheduling before being executed.
This buffering means that the fetch and decode stages can be more detached from the execution units than is feasible in a more traditional microcoded (or hard-wired) design. As this allows a degree of freedom regarding execution order, it makes some extraction of
instruction-level parallelism
Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...
out of a normal single-threaded program possible (provided that dependencies are checked, etc.). It opens up for more analysis and therefore also for reordering of code sequences in order to dynamically optimize mapping and scheduling of μops onto machine resources (such as
ALUs, load/store units, etc.). As this happens on the μop-level, sub-operations of different machine (macro) instructions may often intermix in a particular μop-sequence, forming partially reordered machine instructions as a direct consequence of the out-of-order dispatching of microinstructions from several macro instructions. However, this is not the same as the ''micro-op fusion'', which aims at the fact that a more complex microinstruction may replace a few simpler microinstructions in certain cases, typically in order to minimize state changes and usage of the queue and
re-order buffer
A re-order buffer (ROB) is a hardware unit used in an extension to Tomasulo's algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order.
The buffer is a circular buffer ...
space, therefore reducing power consumption. Micro-op fusion is used in some modern CPU designs.
Execution optimization has gone even further; processors not only translate many machine instructions into a series of μops, but also do the opposite when appropriate; they combine certain machine instruction sequences (such as a compare followed by a conditional jump) into a more complex μop which fits the execution model better and thus can be executed faster or with less machine resources involved. This is also known as ''macro-op fusion''.
Another way to try to improve performance is to cache the decoded micro-operations in a
micro-operation cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
, so that if the same macroinstruction is executed again, the processor can directly access the decoded micro-operations from the cache, instead of decoding them again. The
execution trace cache
The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium ...
found in
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
NetBurst
The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium ...
microarchitecture (
Pentium 4
Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 20 ...
) is a widespread example of this technique. The size of this cache may be stated in terms of how many thousands (or strictly multiple of 1024) of micro-operations it can store: ''Kμops''.
References
{{CPU technologies
Instruction processing
Central processing unit