HOME

TheInfoList



OR:

In
computer engineering Computer engineering (CoE or CpE) is a branch of electrical engineering and computer science that integrates several fields of computer science and electronic engineering required to develop computer hardware and software. Computer enginee ...
, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of
bus snooping Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache (a snoopy cache) monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. A cache con ...
. Bus snooping methods scale poorly due to the use of
broadcasting Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum (radio waves), in a one-to-many model. Broadcasting began ...
. These methods can be used to target both
performance A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Management science In the work place ...
and
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
of directory systems.


Full bit vector format

In the full bit vector format, for each possible
cache line A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
in
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
, a
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
is used to track whether every individual processor has that line stored in its cache. The full bit vector format is the simplest structure to implement, but the least scalable. The SGI Origin 2000 uses a combination of full bit vector and coarse bit vector depending on the number of processors. Each directory entry must have 1 bit stored per processor per cache line, along with bits for tracking the state of the directory. This leads to the total size required being ''(number of processors)×number of cache lines'', having a storage overhead ratio of ''(number of processors)/(cache block size×8)''. It can be observed that directory overhead scales linearly with the number of processors. While this may be fine for a small number of processors, when implemented in large systems the size requirements for the directory becomes excessive. For example, with a block size of 32 bytes and 1024 processors, the storage overhead ratio becomes 1024/(32×8) = 400%.


Coarse bit vector format

The coarse bit vector format has a similar structure to the full bit vector format, though rather than tracking one bit per processor for every cache line, the directory groups several processors into nodes, storing whether a cache line is stored in a node rather than a line. This improves size requirements at the expense of
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...
traffic saving (processors per node)×(total lines) bits of space. Thus the ratio overhead is the same, just replacing number of processors with number of processor groups. When a bus request is made for a cache line that one processor in the group has, the directory broadcasts the signal into every processor in the node rather than just the caches that contain it, leading to unnecessary traffic to nodes that do not have the data cached. In this case the directory entry uses 1 bit for a group of processors for each cache line. For the same example as Full Bit Vector format if we consider 1 bit for 8 processors as a group, then the storage overhead will be 128/(32×8)=50%. This is a significant improvement over the Full Bit Vector format.


Sparse directory format

A cache only stores a small subset of blocks in main memory at a particular time. Hence most of the entries in the directory will belong to uncached blocks. In the sparse directory format the wastage is reduced by storing only the cached blocks in the directory. Consider a processor with a cache size of 64KB with a block size of 32 bytes and the main memory size to be 4MB. The maximum number of entries that the directory can have in the sparse directory format is 2048. If the directory has an entry for all the blocks in the memory the number of entries in the directory will be 131072. Thus it is evident that the storage improvement provided by sparse directory format is very significant.


Number-balanced binary tree format

In this format the directory is decentralised and distributed among the caches that share a memory block. Different caches that share a memory block are arranged in the form of a
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binar ...
. The cache that accesses a memory block first is the root node. Each memory block has the root node information (HEAD) and Sharing counter field (SC). The SC field has the number of caches that share the block. Each cache entry has pointers to the next sharing caches known as L-CHD and R-CHD. A condition for this directory is that the binary tree should be number balanced, i.e the number of nodes in the left sub tree must be equal to or one greater than the number of nodes in the right subtree. All the subtrees should also be number balanced.


Chained directory format

In this format the memory holds the directory pointer to the latest cache that accessed the block and each cache has the pointer to the previous cache that accessed the block. So when a processor sends a write request to a block in memory, the processor sends invalidations down the chain of pointers. In this directory when a cache block is replaced we need to traverse the
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby uni ...
in order to change the directory which increases latency. In order to prevent this
doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains three fields: two link fields (references to the previous and to the next node in the se ...
s are widely used now in which each cached copy has pointers to previous and the next cache that accesses the block.


Limited pointer format

The limited pointer format uses a set number of pointers to track the processors that are caching the data. When a new processor caches a block, a free pointer is chosen from a pool to point to that processor. There are a few options for handling cases when the number of sharers exceeds the number of free pointers. One method is to invalidate one of the sharers, using its pointer for the new requestor, though this can be costly in cases where a block has a large number of readers, such as a lock. Another method is to have a separate pool of free pointers available to all the blocks. This method is usually effective as the number of blocks shared by a large number of processors is not normally very large.


References

{{Reflist Computer architecture