Introduction to types of cache misses
Processor performance increase due to cache hierarchy depends on the number of accesses to the cache that satisfy block requests from the cache (cache hits) versus those that do not. Unsuccessful attempts to read or write data from the cache (cache misses) result in lower level or main memory access, which increases latency. There are three basic types of cache misses known as the 3Cs and some other less popular cache misses.Compulsory misses
Each memory block when first referenced causes a compulsory miss. This implies that the number of compulsory misses is the number of distinct memory blocks ever referenced. They are sometimes called cold misses too. Cold misses cannot be avoided unless the block is prefetched. It has been observed that an increase in block size to a certain extent to exploit spatial locality leads to a decrease in cold misses. Increasing block size leads to prefetching of nearby words in a block and preventing future cold misses. Increasing the block size too much can lead to prefetching of useless data, thus increasing the number of cold misses.Conflict misses
Conflict misses occur when the data required was in the cache previously, but got evicted. These evictions occur because another request was mapped to the same cache line. Generally, conflict misses are measured by subtracting the number of misses in a cache with limited associativity by the number of misses of a fully associative cache of the same size and cache block size. Since conflict misses can be attributed to the lack of sufficient associativity, increasing the associativity to a certain extent (8‐way associativity almost as effective as fully‐associative) decreases the amount of conflict misses, however, such an approach increases the cache access time and consumes a lot more power than a set associative cache.Capacity misses
A capacity miss occurs due to the limited size of a cache and not the cache's mapping function. When the working set, i.e., the data that is currently important to the program, is bigger than the cache, capacity misses occur frequently. Out of the 3Cs capacity misses are the hardest to identify, and can be thought of as non-compulsory misses in a fully associative cache. In a single processor system, the misses that exist after subtracting the number of compulsory misses and conflict misses can be categorized as capacity misses. Since capacity misses can be attributed to the limited size of a cache, a simple way to reduce the number of such misses is to increase the cache size. Although this method is very intuitive, it leads to a longer access time and an increase in cache area and its power consumption. Also, after a certain cache size, the number of misses saturate and do not decrease even on increasing the cache size. The above three kinds of misses only address uni-processor misses.Coherence misses
The 3Cs group of cache misses can be extended to 4Cs when a multi-processor system with cache is involved, the fourth C being coherence misses. The coherence miss count is the number of memory accesses that miss because a cache line that would otherwise be present in the thread's cache has been invalidated by a write from another thread.Coverage misses
The 4Cs group of cache misses can be further extended to 5Cs when the multi-processor system includes a coherence directory organized as a cache, i.e., that can replace entries. This fifth C stands for Coverage. The coverage miss count is the number of memory accesses that miss because a cache line that would otherwise be present in the processor's cache has been invalidated as a consequence of a directory eviction. If the directory is not able to track a cache line due to its limited capacity, the line must be invalidated from the processors' cache to maintainSystem-related misses
System activities such asReplaced misses
When a context switch occurs, the cache state is modified and some of its blocks are replaced. The misses on access to these blocks are called replaced misses.Reordered misses
Some blocks in the cache may not be replaced due to context switching but their recency is changed. Reordered misses are said to occur when misses occur due to change in recency and not due to the blocks being replaced. However, when the suspended process resumes execution, reordered blocks don't lead to context switch misses when no other misses cause these reordered blocks to be evicted. System-related misses become significant when context switching occurs regularly. Increasing the cache size leads to a decrease in capacity and conflict misses but it has been observed that it leads to an increase in system-related misses if the cache is still smaller than the working set of the processes sharing the cache. Hence reducing the number of system-related misses presents a challenge.Average memory access time
These cache misses directly correlate to the increase in cycles per instruction (CPI). However the amount of effect the cache misses have on the CPI also depends on how much of the cache miss can be overlapped with computations due to the ILP ( Instruction-level parallelism ) and how much of it can be overlapped with other cache misses due to Memory-level parallelism. If we ignore both these effects, then the average memory access time becomes an important metric. It provides a measure of the performance of the memory systems and hierarchies. It refers to the average time it takes to perform a memory access. It is the addition of the execution time for the memory instructions and the memory stall cycles. The execution time is the time for a cache access, and the memory stall cycles include the time to service a cache miss and access lower levels of memory. If the access latency, miss rate and miss penalty are known, the average memory access time can be calculated with: where is the access latency of level one cache, is the miss rate of level one cache and is the additional cycles a miss at a higher level takes to be serviced compared to a hit at higher level, and is calculated with: this formula can be expanded further and used recursively for all the further levels in the memory hierarchy to get the .Power law of cache misses
The Power law of cache misses shows a trend in the capacity misses in a particular application of the program as affected by the cache size. This empirical observation led to the mathematical form of power law, which shows the relation between the miss rate and the cache size. It can be stated aswhere ''M'' is the miss rate for a cache of size ''C'' and ''M0'' is the miss rate of a baseline cache. The exponent ''α'' is workload-specific and typically ranges from 0.3 to 0.7, with an average of 0.5. The power law was validated on quite a few of real-world benchmarks. This relation shows that only a small fraction of cache misses can be eliminated for constant increase in cache size. This law holds true only for a certain finite range of cache sizes, up to which the miss rate doesn't flatten out. The miss rate eventually becomes stagnant at a certain, large enough cache size, and after that the relation does not give correct estimates.Stack distance profile
The stack distance profile is a better representation of how the cache misses are affected by the cache size. The power law of cache misses just showed an rough approximation of the same. A stack distance profile captures the temporal reuse behavior of an application in a fully or set-associative cache. Applications that exhibit more temporal reuse behavior generally access data that is more recently used. Let us assume the associativity of a cache to be . To collect the stack distance profile information of this cache, assuming it has LRU replacement policy, counters are used starting from to and one additional counter