Amdahl Corporation's 470 series general purpose mainframe had a 7-step pipeline, and a patented branch prediction circuit.
Hazards
The model of sequential execution assumes that each instruction completes before the next one begins; this assumption is not true on a pipelined processor. A situation where the expected result is problematic is known as a
hazard
A hazard is a potential source of harm. Substances, events, or circumstances can constitute hazards when their nature would allow them, even just theoretically, to cause damage to health, life, property, or any other interest of value. The probab ...
. Imagine the following two register instructions to a hypothetical processor:
1: add 1 to R5
2: copy R5 to R6
If the processor has the 5 steps listed in the initial illustration (the 'Basic five-stage pipeline' at the start of the article), instruction 1 would be fetched at time ''t''
1 and its execution would be complete at ''t
5''. Instruction 2 would be fetched at ''t
2'' and would be complete at ''t
6''. The first instruction might deposit the incremented number into R5 as its fifth step (register write back) at ''t
5''. But the second instruction might get the number from R5 (to copy to R6) in its second step (instruction decode and register fetch) at time ''t
3''. It seems that the first instruction would not have incremented the value by then. The above code invokes a hazard.
Writing computer programs in a
compiled language might not raise these concerns, as the compiler could be designed to generate machine code that avoids hazards.
Workarounds
In some early DSP and RISC processors, the documentation advises programmers to avoid such dependencies in adjacent and nearly adjacent instructions (called
delay slot
In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction on a RISC or DSP a ...
s), or declares that the second instruction uses an old value rather than the desired value (in the example above, the processor might counter-intuitively copy the unincremented value), or declares that the value it uses is undefined. The programmer may have unrelated work that the processor can do in the meantime; or, to ensure correct results, the programmer may insert
NOPs into the code, partly negating the advantages of pipelining.
Solutions
Pipelined processors commonly use three techniques to work as expected when the programmer assumes that each instruction completes before the next one begins:
*The pipeline could
stall, or cease scheduling new instructions until the required values are available. This results in empty slots in the pipeline, or ''bubbles'', in which no work is performed.
*An additional data path can be added that routes a computed value to a future instruction elsewhere in the pipeline before the instruction that produced it has been fully retired, a process called
operand forwarding
Operand forwarding (or data forwarding) is an optimization in pipelined CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising ...
.
*The processor can locate other instructions which are not dependent on the current ones and which can be immediately executed without hazards, an optimization known as
out-of-order execution.
Branches
A branch out of the normal instruction sequence often involves a hazard. Unless the processor can give effect to the branch in a single time cycle, the pipeline will continue fetching instructions sequentially. Such instructions cannot be allowed to take effect because the programmer has diverted control to another part of the program.
A conditional branch is even more problematic. The processor may or may not branch, depending on a calculation that has not yet occurred. Various processors may stall, may attempt
branch prediction
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow i ...
, and may be able to begin to execute two different program sequences (
eager execution
In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
), each assuming the branch is or is not taken, discarding all work that pertains to the incorrect guess.
A processor with an implementation of branch prediction that usually makes correct predictions can minimize the performance penalty from branching. However, if branches are predicted poorly, it may create more work for the processor, such as
flushing from the pipeline the incorrect code path that has begun execution before resuming execution at the correct location.
Programs written for a pipelined processor deliberately avoid branching to minimize possible loss of speed. For example, the programmer can handle the usual case with sequential execution and branch only on detecting unusual cases. Using programs such as
gcov to analyze
code coverage
In computer science, test coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing ...
lets the programmer measure how often particular branches are actually executed and gain insight with which to optimize the code.
In some cases, a programmer can handle both the usual case and unusual case with
branch-free code.
Special situations
; Self-modifying programs
: The technique of
self-modifying code
In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code ...
can be problematic on a pipelined processor. In this technique, one of the effects of a program is to modify its own upcoming instructions. If the processor has an
instruction cache, the original instruction may already have been copied into a
prefetch input queue
Fetching the instruction opcodes from program memory well in advance is known as prefetching and it is served by using prefetch input queue (PIQ).The pre-fetched instructions are stored in data structure - namely a queue. The fetching of opcodes ...
and the modification will not take effect. Some processors such as the
Zilog Z280 can configure their on-chip cache memories for data-only fetches, or as part of their ordinary memory address space, and avoid such difficulties with self-modifying instructions.
; Uninterruptible instructions
: An instruction may be uninterruptible to ensure its
atomicity, such as when it swaps two items. A sequential processor permits
interrupt
In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...
s between instructions, but a pipelining processor overlaps instructions, so executing an uninterruptible instruction renders portions of ordinary instructions uninterruptible too. The
Cyrix coma bug would
hang a single-core system using an infinite loop in which an uninterruptible instruction was always in the pipeline.
Design considerations
; Speed
: Pipelining keeps all portions of the processor occupied and increases the amount of useful work the processor can do in a given time. Pipelining typically reduces the processor's cycle time and increases the throughput of instructions. The speed advantage is diminished to the extent that execution encounters
hazards that require execution to slow below its ideal rate. A non-pipelined processor executes only a single instruction at a time. The start of the next instruction is delayed not based on hazards but unconditionally.
: A pipelined processor's need to organize all its work into modular steps may require the duplication of registers, which increases the latency of some instructions.
; Economy
: By making each dependent step simpler, pipelining can enable complex operations more economically than adding complex circuitry, such as for numerical calculations. However, a processor that declines to pursue increased speed with pipelining may be simpler and cheaper to manufacture.
; Predictability
: Compared to environments where the programmer needs to avoid or work around hazards, use of a non-pipelined processor may make it easier to program and to train programmers. The non-pipelined processor also makes it easier to predict the exact timing of a given sequence of instructions.
Illustrated example
To the right is a generic pipeline with four stages: fetch, decode, execute and write-back. The top gray box is the list of instructions waiting to be executed, the bottom gray box is the list of instructions that have had their execution completed, and the middle white box is the pipeline.
The execution is as follows:
Pipeline bubble

A pipelined processor may deal with hazards by stalling and creating a bubble in the pipeline, resulting in one or more cycles in which nothing useful happens.
In the illustration at right, in cycle 3, the processor cannot decode the purple instruction, perhaps because the processor determines that decoding depends on results produced by the execution of the green instruction. The green instruction can proceed to the Execute stage and then to the Write-back stage as scheduled, but the purple instruction is stalled for one cycle at the Fetch stage. The blue instruction, which was due to be fetched during cycle 3, is stalled for one cycle, as is the red instruction after it.
Because of the bubble (the blue ovals in the illustration), the processor's Decode circuitry is idle during cycle 3. Its Execute circuitry is idle during cycle 4 and its Write-back circuitry is idle during cycle 5.
When the bubble moves out of the pipeline (at cycle 6), normal execution resumes. But everything now is one cycle late. It will take 8 cycles (cycle 1 through 8) rather than 7 to completely execute the four instructions shown in colors.
See also
*
Wait state
*
Classic RISC pipeline
In the history of computer hardware, some early reduced instruction set computer central processing units (RISC CPUs) used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, ...
Notes
References
External links
Branch Prediction in the Pentium Family
ArsTechnica article on pipelining
{{DEFAULTSORT:Instruction Pipeline
Pipeline