Replay System
   HOME

TheInfoList



OR:

The replay system is a subsystem within the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Pentium 4 Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 20 ...
processor. Its primary function is to catch operations that have been mistakenly sent for execution by the processor's
scheduler A schedule (, ) or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things ...
. Operations caught by the replay system are then re-executed in a loop until the conditions necessary for their proper execution have been fulfilled.


Overview

The replay system came about as a result of Intel's quest for ever-increasing clock speeds. These higher clock speeds necessitated very lengthy
pipelines A pipeline is a system of pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries around the world. The Un ...
(up to 31 stages in the Prescott core). Because of this, there are six stages between the scheduler and the
execution unit In computer engineering, an execution unit (E-unit or EU) is a part of a processing unit that performs the operations and calculations forwarded from the instruction unit. It may have its own internal control sequence unit (not to be confused w ...
s in the Prescott core. In an attempt to maintain acceptable performance, Intel engineers had to design the scheduler to be very optimistic. The scheduler in a Pentium 4 processor is so aggressive that it will send operations for execution without a guarantee that they can be successfully executed. (Among other things, the scheduler assumes all data is in level 1 "
trace cache In computer architecture, a trace cache or execution trace cache is a specialized instruction cache which stores the dynamic stream of instructions known as trace. It helps in increasing the instruction fetch bandwidth and decreasing power cons ...
"
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
.) The most common reason execution fails is that the requisite data is not available, which itself is most likely due to a cache miss. When this happens, the replay system signals the scheduler to stop, then repeatedly executes the failed string of dependent operations until they have completed successfully.


Performance considerations

Not surprisingly, in some cases the replay system can have a very bad impact on performance. Under normal circumstances, the execution units in the Pentium 4 are in use roughly 33% of the time. When the replay system is invoked, it will occupy execution units nearly every available cycle. This wastes power, which is an increasingly important architectural design metric, but poses no performance penalty because the execution units would be sitting idle anyway. However, if hyper-threading is in use, the replay system will prevent the other thread from utilizing the execution units. This is the true cause of any performance degradation concerning hyper-threading. In Prescott, the Pentium 4 gained a replay queue, which reduces the time the replay system will occupy the execution units. In other cases, where each thread is processing different types of operations, the replay system will not interfere, and a performance increase can appear. This explains why performance with hyper-threading is application-dependent.


See also

*
Instruction pipeline In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming Mac ...
*
Speculative execution Speculative execution is an optimization (computer science), optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that woul ...
*
Out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...
*
Simultaneous multithreading Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
*
Data dependency A data dependency in computer science is a situation in which a program statement (instruction) refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements (or instructions) i ...


References

{{Reflist X86 microprocessors