A data dependency in

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

is a situation in which a program statement (instruction) refers to the data of a preceding statement. In

compiler theory In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

, the technique used to discover data dependencies among statements (or instructions) is called

dependence analysis In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement ''S2'' depends on ''S1'' if ''S1'' must be executed before ''S2''. Broadly, there are two classes of depend ...

Description

Assuming statement

S_1

and

S_2

S_2

depends on

S_1

if: :

\neq \varnothing

where: *

I(S_i)

is the set of memory locations read by *

O(S_j)

is the set of memory locations written by and * there is a feasible run-time execution path from

S_1

to These conditions are called Bernstein's Conditions, named after Arthur J. Bernstein. Three cases exist: * Anti-dependence:

I(S_1) \cap O(S_2) \neq \varnothing

S_1 \rightarrow S_2

and

S_1

reads something before

S_2

overwrites it * Flow (data) dependence:

O(S_1) \cap I(S_2) \neq \varnothing

S_1 \rightarrow S_2

and

S_1

writes before something read by

S_2

* Output dependence:

O(S_1) \cap O(S_2) \neq \varnothing

S_1 \rightarrow S_2

and both write the same memory location.

Types

Data hazards

Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in

race conditions A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent ...

(also termed race hazards). There are three situations in which a data hazard can occur: # read after write (RAW), a ''true dependency'' # write after read (WAR), an ''anti-dependency'' # write after write (WAW), an ''output dependency'' # read after read (RAR), a false dependency Read after read (RAR) is not a hazard case. Consider two instructions and , with occurring before in program order.

Read after write (RAW)

( tries to read a source before writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline.

= Example

= For example: i1. R2 <- R5 + R8 i2. R4 <- R2 + R8 The first instruction is calculating a value to be saved in register , and the second is going to use this value to compute a result for register . However, in a

pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...

, when operands are fetched for the 2nd operation, the results from the first have not yet been saved, and hence a data dependency occurs. A data dependency occurs with instruction , as it is dependent on the completion of instruction .

Write after read (WAR)

( tries to write a destination before it is read by ) A write after read (WAR) data hazard represents a problem with concurrent execution.

= Example

= For example: i1. R4 <- R1 + R5 i2. R5 <- R1 + R2 In any situation with a chance that may finish before (i.e., with concurrent execution), it must be ensured that the result of register is not stored before has had a chance to fetch the operands.

Write after write (WAW)

( tries to write an operand before it is written by ) A write after write (WAW) data hazard may occur in a concurrent execution environment.

= Example

= For example: i1. R5 <- R4 + R7 i2. R5 <- R1 + R3 The write back (WB) of must be delayed until finishes executing.

True dependency (read-after-write)

A true dependency, also known as a flow dependency or data dependency, occurs when an instruction depends on the result of a previous instruction. A violation of a true dependency leads to a read-after-write (RAW) hazard. 1. A = 3 2. B = A 3. C = B Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2 and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on instruction 1.

Instruction level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically, ILP refers to the average number of instructions run per step of this parallel execution. Di ...

is therefore not an option in this example.

Anti-dependency (write-after-read)

An anti-dependency occurs when an instruction requires a value that is later updated. A violation of an anti-dependency leads to a write-after-read (WAR) hazard. In the following example, instruction 2 anti-depends on instruction 3 — the ordering of these instructions cannot be changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would affect the final value of A. 1. B = 3 2. A = B + 1 3. B = 7 Example: MUL R3,R1,R2 ADD R2,R5,R6 It is clear that there is anti-dependence between these 2 instructions. At first we read R2 then in second instruction we are Writing a new value for it. An anti-dependency is an example of a ''name dependency''. That is, renaming of variables could remove the dependency, as in the next example: 1. B = 3 N. B2 = B 2. A = B2 + 1 3. B = 7 A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in parallel. Note that there is still a read-after-write dependency: instruction 2 is truly dependent on instruction N, which is truly dependent upon instruction 1. This dependency existed in the original version, where instruction 2 was truly dependent on instruction 1. This dependency cannot be safely removed.

Output dependency (write-after-write)

An output dependency occurs when the ordering of instructions will affect the final output value of a variable. A violation of an output dependency leads to an write-after-write (WAW) hazard. In the example below, there is an output dependency between instructions 3 and 1 — changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel. 1. B = 3 2. A = B + 1 3. B = 7 As with anti-dependencies, output dependencies are ''name dependencies''. That is, they may be removed through renaming of variables, as in the below modification of the above example: 1. B2 = 3 2. A = B2 + 1 3. B = 7

Implications

Conventional programs are written assuming the sequential execution model. Under this model, instructions execute one after the other, atomically (i.e., at any given point in time, only one instruction is executed) and in the order specified by the program. However, dependencies among statements or instructions may hinder parallelism — parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting

instruction-level parallelism Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...

. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely

hazards A hazard is a potential source of harm. Substances, events, or circumstances can constitute hazards when their nature would potentially allow them to cause damage to health, life, property, or any other interest of value. The probability of that ...

Relevance in computing

Data dependencies are relevant in various areas of computing, particularly in processor design, compiler construction, parallel computing, and concurrent programming.

Processor design

Instruction pipelining In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming in ...

: In pipelined processors, multiple instruction are executed in parallel in multiple pipeline stages. Thereby, data dependencies between registers must be respected and handled in the processor pipeline. Most relevant are true dependencies that are resolved e.g. by stalling the pipeline or operand forwarding. *

Out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...

: Modern processors often execute instructions out of their original order to improve performance. Thereby, name dependencies between registers must be respected (in addition to data dependencies) and are resolved e.g. by

register renaming In computer architecture, register renaming is a technique that abstracts logical processor register, registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instructio ...

scoreboarding Scoreboarding is a centralized method, first used in the CDC 6600 computer, for dynamically scheduling instructions so that they can execute out of order when there are no conflicts and the hardware is available. In a scoreboard, the data depend ...

. Data dependencies are also relevant for memory accesses and must be respected by

memory disambiguation {{Unreferenced, date=October 2014 Memory disambiguation is a set of techniques employed by high-performance out-of-order execution microprocessors that execute memory access instructions (loads and stores) out of program order. The mechanisms for ...

techniques that execute

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

access instructions (loads and stores) out of program order.

Compiler construction

Data dependencies are relevant for various

compiler optimizations An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of opt ...

, e.g. *

Instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...

: Compilers must schedule instructions in a way that respects data dependencies. This is crucial in optimizing compilers that rearrange code for better performance. * Loop transformations: In optimizing loops, compilers need to consider data dependencies to apply transformations like loop unrolling, fusion, or tiling without changing the semantics of the program. *

Code motion In computer science, code motion, also known as code hoisting, code sinking, loop-invariant code motion, or code factoring, is a blanket term for any process that moves code within a program for performance or size benefits, and is a common optimi ...

: When a compiler considers moving a piece of code, it must ensure that data dependencies are not violated.

References

{{reflist Compilers Analysis of parallel algorithms

Description

Types

Data hazards

Read after write (RAW)

= Example

Write after read (WAR)

= Example

Write after write (WAW)

= Example

True dependency (read-after-write)

Anti-dependency (write-after-read)

Output dependency (write-after-write)

Implications

Relevance in computing

Processor design

Compiler construction

See also

References