Overview
The average memory reference time is : where : = miss ratio = 1 - (hit ratio) : = time to make main-memory access when there is a miss (or, with a multi-level cache, average memory reference time for the next-lower cache) : = latency: time to reference the cache (should be the same for hits and misses) : = secondary effects, such as queuing effects in multiprocessor systems A cache has two primary figures of merit: latency and hit ratio. A number of secondary factors also affect cache performance. Alan Jay Smith. "Design of CPU Cache Memories". Proc. IEEE TENCON, 1987Policies
Bélády's algorithm
The most efficient caching algorithm would be to discard information which would not be needed for the longest time; this is known as Bélády's optimal algorithm, optimal replacement policy, or the clairvoyant algorithm. Since it is generally impossible to predict how far in the future information will be needed, this is unfeasible in practice. The practical minimum can be calculated after experimentation, and the effectiveness of a chosen cache algorithm can be compared.Random replacement (RR)
Random replacement selects an item and discards it to make space when necessary. This algorithm does not require keeping any access history. It has been used in ARM processors due to its simplicity, and it allows efficientSimple queue-based policies
First in first out (FIFO)
With this algorithm, the cache behaves like a FIFO queue; it evicts blocks in the order in which they were added, regardless of how often or how many times they were accessed before.Last in first out (LIFO) or First in last out (FILO)
The cache behaves like a stack, and unlike a FIFO queue. The cache evicts the block added most recently first, regardless of how often or how many times it was accessed before.SIEVE
SIEVE is a simple eviction algorithm designed specifically for web caches, such as key-value caches and Content Delivery Networks. It uses the idea of lazy promotion and quick demotion. Therefore, SIEVE does not update the global data structure at cache hits and delays the update till eviction time; meanwhile, it quickly evicts newly inserted objects because cache workloads tend to show high one-hit-wonder ratios, and most of the new objects are not worthwhile to be kept in the cache. SIEVE uses a single FIFO queue and uses a moving hand to select objects to evict. Objects in the cache have one bit of metadata indicating whether the object has been requested after being admitted into the cache. The eviction hand points to the tail of the queue at the beginning and moves toward the head over time. Compared with the CLOCK eviction algorithm, retained objects in SIEVE stay in the old position. Therefore, new objects are always at the head, and the old objects are always at the tail. As the hand moves toward the head, new objects are quickly evicted (quick demotion), which is the key to the high efficiency in the SIEVE eviction algorithm. SIEVE is simpler than LRU, but achieves lower miss ratios than LRU on par with state-of-the-art eviction algorithms. Moreover, on stationary skewed workloads, SIEVE is better than existing known algorithms including LFU.Simple recency-based policies
Least Recently Used (LRU)
Discards least recently used items first. This algorithm requires keeping track of what was used and when, which is cumbersome. It requires "age bits" for cache lines, and tracks the least recently used cache line based on these age bits. When a cache line is used, the age of the other cache lines changes. LRU is a family of caching algorithms, that includes 2Q by Theodore Johnson and Dennis Shasha and LRU/K by Pat O'Neil, Betty O'Neil and Gerhard Weikum. The access sequence for the example is A B C D E D F:Time-aware, least-recently used
Time-aware, least-recently-used (TLRU) is a variant of LRU designed for when the contents of a cache have a valid lifetime. The algorithm is suitable for network cache applications such as information-centric networking (ICN),Most-recently-used (MRU)
Unlike LRU, MRU discards the most-recently-used items first. At the 11th VLDB conference, Chou and DeWitt said: "When a file is being repeatedly scanned in a ooping sequentialreference pattern, MRU is the best replacement algorithm." Researchers presenting at the 22nd VLDB conference noted that for random access patterns and repeated scans over large datasets (also known as cyclic access patterns), MRU cache algorithms have more hits than LRU due to their tendency to retain older data. MRU algorithms are most useful in situations where the older an item is, the more likely it is to be accessed. The access sequence for the example is A B C D E C D B:Segmented LRU (SLRU)
An SLRU cache is divided into two segments: probationary and protected. Lines in each segment are ordered from most- to least-recently-accessed. Data from misses is added to the cache at the most-recently-accessed end of the probationary segment. Hits are removed from where they reside and added to the most-recently-accessed end of the protected segment; lines in the protected segment have been accessed at least twice. The protected segment is finite; migration of a line from the probationary segment to the protected segment may force the migration of the LRU line in the protected segment to the most-recently-used end of the probationary segment, giving this line another chance to be accessed before being replaced. The size limit of the protected segment is an SLRU parameter which varies according to I/O workload patterns. When data must be discarded from the cache, lines are obtained from the LRU end of the probationary segment.LRU approximations
LRU may be expensive in caches with higher associativity. Practical hardware usually employs an approximation to achieve similar performance at a lower hardware cost.= Pseudo-LRU (PLRU)
= For CPU caches with large associativity (generally > four ways), the implementation cost of LRU becomes prohibitive. In many CPU caches, an algorithm that almost always discards one of the least recently used items is sufficient; many CPU designers choose a PLRU algorithm, which only needs one bit per cache item to work. PLRU typically has a slightly-worse miss ratio, slightly-better latency, uses slightly less power than LRU, and has a lower overhead than LRU. Bits work as a binary tree of one-bit pointers which point to a less-recently-used sub-tree. Following the pointer chain to the leaf node identifies the replacement candidate. With an access, all pointers in the chain from the accessed way's leaf node to the root node are set to point to a sub-tree which does not contain the accessed path. The access sequence in the example is A B C D E:= Clock-Pro
= The LRU algorithm cannot be implemented in the critical path of computer systems, such asSimple frequency-based policies
Least frequently used (LFU)
The LFU algorithm counts how often an item is needed; those used less often are discarded first. This is similar to LRU, except that how many times a block was accessed is stored instead of how recently. While running an access sequence, the block which was used the fewest times will be removed from the cache.Least frequent recently used (LFRU)
The least frequent recently used (LFRU) algorithm combines the benefits of LFU and LRU. LFRU is suitable for network cache applications such as ICN, CDNs, and distributed networks in general. In LFRU, the cache is divided into two partitions: privileged and unprivileged. The privileged partition is protected and, if content is popular, it is pushed into the privileged partition. In replacing the privileged partition, LFRU evicts content from the unprivileged partition; pushes content from the privileged to the unprivileged partition, and inserts new content into the privileged partition. LRU is used for the privileged partition and an approximated LFU (ALFU) algorithm for the unprivileged partition.LFU with dynamic aging (LFUDA)
A variant, LFU with dynamic aging (LFUDA), uses dynamic aging to accommodate shifts in a set of popular objects; it adds a cache-age factor to the reference count when a new object is added to the cache or an existing object is re-referenced. LFUDA increments cache age when evicting blocks by setting it to the evicted object's key value, and the cache age is always less than or equal to the minimum key value in the cache. If an object was frequently accessed in the past and becomes unpopular, it will remain in the cache for a long time (preventing newly- or less-popular objects from replacing it). Dynamic aging reduces the number of such objects, making them eligible for replacement, and LFUDA reduces cache pollution caused by LFU when a cache is small.RRIP-style policies
RRIP-style policies are the basis for other cache replacement policies, including Hawkeye.Re-Reference Interval Prediction (RRIP)
RRIP is a flexible policy, proposed by= Static RRIP (SRRIP)
= SRRIP inserts lines with an RRPV value of maxRRPV; a line which has just been inserted will be the most likely to be evicted on a cache miss.= Bimodal RRIP (BRRIP)
= SRRIP performs well normally, but suffers when the working set is much larger than the cache size and causes cache thrashing. This is remedied by inserting lines with an RRPV value of maxRRPV most of the time, and inserting lines with an RRPV value of maxRRPV - 1 randomly with a low probability. This causes some lines to "stick" in the cache, and helps prevent thrashing. BRRIP degrades performance, however, on non-thrashing accesses. SRRIP performs best when the working set is smaller than the cache, and BRRIP performs best when the working set is larger than the cache.= Dynamic RRIP (DRRIP)
= DRRIP uses set dueling to select whether to use SRRIP or BRRIP. It dedicates a few sets (typically 32) to use SRRIP and another few to use BRRIP, and uses a policy counter which monitors set performance to determine which policy will be used by the rest of the cache.Policies approximating Bélády's algorithm
Bélády's algorithm is the optimal cache replacement policy, but it requires knowledge of the future to evict lines that will be reused farthest in the future. A number of replacement policies have been proposed which attempt to predict future reuse distances from past access patterns, allowing them to approximate the optimal replacement policy. Some of the best-performing cache replacement policies attempt to imitate Bélády's algorithm.Hawkeye
Hawkeye attempts to emulate Bélády's algorithm by using past accesses by a PC to predict whether the accesses it produces generate cache-friendly (used later) or cache-averse accesses (not used later). It samples a number of non-aligned cache sets, uses a history of length and emulates Bélády's algorithm on these accesses. This allows the policy to determine which lines should have been cached and which should not, predicting whether an instruction is cache-friendly or cache-averse. This data is then fed into an RRIP; accesses from cache-friendly instructions have a lower RRPV value (likely to be evicted later), and accesses from cache-averse instructions have a higher RRPV value (likely to be evicted sooner). The RRIP backend makes the eviction decisions. The sampled cache and OPT generator set the initial RRPV value of the inserted cache lines. Hawkeye won the CRC2 cache championship in 2017, and Harmony is an extension of Hawkeye which improves prefetching performance.Mockingjay
Mockingjay tries to improve on Hawkeye in several ways. It drops the binary prediction, allowing it to make more fine-grained decisions about which cache lines to evict, and leaves the decision about which cache line to evict for when more information is available. Mockingjay keeps a sampled cache of unique accesses, the PCs that produced them, and their timestamps. When a line in the sampled cache is accessed again, the time difference will be sent to the reuse distance predictor. The RDP uses temporal difference learning, where the new RDP value will be increased or decreased by a small number to compensate for outliers; the number is calculated as . If the value has not been initialized, the observed reuse distance is inserted directly. If the sampled cache is full and a line needs to be discarded, the RDP is instructed that the PC that last accessed it produces streaming accesses. On an access or insertion, the estimated time of reuse (ETR) for this line is updated to reflect the predicted reuse distance. On a cache miss, the line with the highest ETR value is evicted. Mockingjay has results which are close to the optimal Bélády's algorithm.Machine-learning policies
A number of policies have attempted to use perceptrons, markov chains or other types ofOther policies
Low inter-reference recency set (LIRS)
LIRS is a page replacement algorithm with better performance than LRU and other, newer replacement algorithms. Reuse distance is a metric for dynamically ranking accessed pages to make a replacement decision. LIRS addresses the limits of LRU by using recency to evaluate inter-reference recency (IRR) to make a replacement decision.Adaptive replacement cache
Adaptive replacement cache (ARC) constantly balances between LRU and LFU to improve the combined result. Nimrod Megiddo and Dharmendra S. ModhaClock with adaptive replacement
Clock with adaptive replacement (CAR) combines the advantages of ARC and Clock. CAR performs comparably to ARC, and outperforms LRU and Clock. Like ARC, CAR is self-tuning and requires no user-specified parameters.Multi-queue
The multi-queue replacement (MQ) algorithm was developed to improve the performance of a second-level buffer cache, such as a server buffer cache, and was introduced in a paper by Zhou, Philbin, and Li. Yuanyuan Zhou, James Philbin, and Kai LiPannier
PannierCheng Li, Philip Shilane, Fred Douglis and Grant WallaceStatic analysis
Static analysis determines which accesses are cache hits or misses to indicate the worst-case execution time of a program. An approach to analyzing properties of LRU caches is to give each block in the cache an "age" (0 for the most recently used) and compute intervals for possible ages. This analysis can be refined to distinguish cases where the same program point is accessible by paths that result in misses or hits. An efficient analysis may be obtained by abstracting sets of cache states by antichains which are represented by compact binary decision diagrams. LRU static analysis does not extend to pseudo-LRU policies. According toSee also
*References
External links