Hardware scout is a technique that uses otherwise idle
processor execution resources to perform
prefetching during
cache misses. When a thread is stalled by a cache miss, the processor pipeline checkpoints the
register file, switches to
runahead mode, and continues to issue instructions from the thread that is waiting for memory. The thread of execution in run-ahead mode is known as a ''scout thread''. When the data returns from memory, the processor restores the register file contents from the checkpoint, and switches back to normal execution mode.
The computation during run-ahead mode is discarded by the processor; nevertheless, scouting provides speedup because
memory level parallelism (MLP) is increased. The cache lines brought into the cache hierarchy are often used by the processor again when it switches back to normal mode.
Rock processor scout
Sun's
Rock processor (later cancelled) used a form of hardware scout. However, any computations in run-ahead mode that do not depend on the cache miss may be retired immediately. This allows both prefetching and traditional
instruction-level parallelism
Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...
.
Scouting vs. SMT
Scouting and
simultaneous multithreading
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
(SMT) both use hardware threads to fight the
memory wall. With scouting, the scout thread runs the instructions from the same instruction stream as the instruction that causes the pipeline stall. In the case of SMT, the SMT thread executes instruction in another context.
Thus, SMT increases the throughput of the processor while scouting increases the performance by lowering the number of cache misses.
See also
*
Rock processor
*
Runahead
References
Improving data cache performance by pre-executing instructions under a cache miss*
ttp://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/mags/mi/&toc=comp/mags/mi/2005/03/m3toc.xml&DOI=10.1109/MM.2005.49 High Performance Throughput ComputingRunahead execution: an alternative to very large instruction windows for out-of-order processors Instruction processing