In
computer architecture, predication is a feature that provides an alternative to
conditional transfer of
control, as implemented by conditional
branch
A branch, also called a ramus in botany, is a stem that grows off from another stem, or when structures like veins in leaves are divided into smaller veins.
History and etymology
In Old English, there are numerous words for branch, includ ...
machine
instructions. Predication works by having conditional (''predicated'') non-branch instructions associated with a ''predicate'', a
Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction (a conditional move) will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to
execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.
Vector processors, some
SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
ISAs (such as
AVX2
Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
and
AVX-512) and
GPUs in general make heavy use of predication, applying one bit of a conditional ''mask vector'' to the corresponding elements in the vector registers being processed, whereas scalar predication in scalar instruction sets only need the one predicate bit. Where predicate masks become particularly powerful in
vector processing
In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array d ...
is if an ''array'' of
condition codes, one per vector element, may feed back into predicate masks that are then applied to subsequent vector instructions.
Overview
Most
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
s contain
conditional code, which will be executed only under specific conditions depending on factors that cannot be determined beforehand, for example depending on user input. As the majority of
processors simply execute the next
instruction in a sequence, the traditional solution is to insert ''branch'' instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing
instruction pipelining, a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see
branch predictor.
Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following
pseudocode:
if condition
else
On a system that uses conditional branching, this might translate to
machine instructions looking similar to:
branch_if_condition_to label1
do_something_else
branch_always_to label2
label1:
do_something
label2:
...
With predication, all possible branch paths are coded inline, but some instructions execute while others do not. The basic idea is that each instruction is associated with a predicate (the word here used similarly to its usage in
predicate logic
First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables ove ...
) and that the instruction will only be executed if the predicate is true. The machine code for the above example using predication might look something like this:
(condition) do_something
(not condition) do_something_else
Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the
do_something
and
do_something_else
blocks of code are short enough.
Predication's simplest form is ''partial predication'', where the architecture has ''conditional move'' or ''conditional select'' instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is ''full predication''. Full predication has a set of predicate registers for storing predicates (which allows multiple nested or sequential branches to be simultaneously eliminated) and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate.
Advantages
The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of
pipelined execution and avoiding problems with the
cache. It also has a number of more subtle benefits:
*Functions that are traditionally computed using simple arithmetic and
bitwise operation
In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operatio ...
s may be quicker to compute using predicated instructions.
*Predicated instructions with different predicates can be mixed with each other and with unconditional code, allowing better
instruction scheduling and so even better performance.
*Elimination of unnecessary branch instructions can make the execution of necessary branches, such as those that make up loops, faster by lessening the load on
branch prediction mechanisms.
*Elimination of the cost of a branch misprediction which can be high on deeply pipelined architectures.
* Instruction sets that have comprehensive
Condition Codes generated by instructions may reduce code size further by directly using the Condition Registers in or as predication.
Disadvantages
Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on
embedded devices, this space cost can be prohibitive. However, some architectures such as
Thumb-2 are able to avoid this issue (see below). Other detriments are the following:
*Predication complicates the hardware by adding levels of
logic
Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure o ...
to critical
paths and potentially degrades clock speed.
*A predicated block includes cycles for all operations, so shorter
paths may take longer and be penalized.
*Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch.
Predication is most effective when paths are balanced or when the longest path is the most frequently executed,
but determining such a path is very difficult at compile time, even in the presence of
profiling information.
History
Predicated instructions were popular in European computer designs of the 1950s, including the
Mailüfterl (1955), the
Zuse Z22 (1955), the
ZEBRA
Zebras (, ) (subgenus ''Hippotigris'') are African equines with distinctive black-and-white striped coats. There are three living species: Grévy's zebra (''Equus grevyi''), the plains zebra (''E. quagga''), and the mountain zebra (''E. ...
(1958), and the
Electrologica X1 (1958). The
IBM ACS-1 design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats.
Hewlett-Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
's
PA-RISC
Precision Architecture reduced instruction set computer, RISC (PA-RISC) or Hewlett Packard Precision Architecture (HP/PA or simply HPPA), is a computer, general purpose computer instruction set architecture (ISA) developed by Hewlett-Packard f ...
architecture (1986) had a feature called ''nullification'', which allowed most instructions to be predicated by the previous instruction.
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
's
POWER architecture (1990) featured conditional move instructions. POWER's successor,
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
(1993), dropped these instructions.
Digital Equipment Corporation
Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
's
Alpha
Alpha (uppercase , lowercase ) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter ''aleph'' , whose name comes from the West Semitic word for ' ...
architecture (1992) also featured conditional move instructions.
MIPS gained conditional move instructions in 1994 with the MIPS IV version; and
SPARC was extended in Version 9 (1994) with conditional move instructions for both integer and floating-point registers.
In the
Hewlett-Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
/
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
IA-64 architecture, most instructions are predicated. The predicates are stored in 64 special-purpose predicate
registers; and one of the predicate registers is always true so that ''unpredicated'' instructions are simply instructions predicated with the value true. The use of predication is essential in IA-64's implementation of
software pipelining because it avoids the need for writing separated code for prologs and epilogs.
In the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
architecture, a family of conditional move instructions (
CMOV
and
FCMOV
) were added to the architecture by the
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It implements the P6 (microarchitecture), P6 microarchitecture (sometimes termed i686), and was the first x86 Intel C ...
(1995) processor. The
CMOV
instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register.
In the
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer, RISC instruction set architectures (ISAs) for central processing unit, com ...
, the original 32-bit instruction set provides a feature called ''conditional execution'' that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's
Thumb
The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thumb ...
instruction set (1994) dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor,
Thumb-2 (2003) overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A (2011) replaced conditional execution with conditional selection instructions.
SIMD, SIMT and vector predication
Some
SIMD
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
instruction sets, like AVX2, have the ability to use a logical
mask to conditionally load/store values to memory, a parallel form of the conditional move, and may also apply individual mask bits to individual arithmetic units executing a parallel operation. The technique
is known in Flynn's taxonomy as "associative processing".
This form of predication is also used in
vector processors and
single instruction, multiple threads GPU computing. All the techniques, advantages and disadvantages of single scalar predication apply just as well to the parallel processing case.
See also
*
Branch predictor
*
Control flow
In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
*
Delay slot
In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction (computer science) ...
*
Instruction-level parallelism
Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...
*
Optimizing compiler
*
Pipeline stall
In the design of instruction pipeline, pipelined computer processors, a pipeline stall is a delay in execution of an instruction set, instruction in order to resolve a hazard (computer architecture), hazard.
Details
In a standard classic RISC pip ...
*
Software pipelining
*
Speculative execution
Speculative execution is an optimization (computer science), optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that woul ...
*
Vector processor
*
Very long instruction word
References
Further reading
*
{{DEFAULTSORT:Predication
Conditional constructs
Instruction processing