computer engineering Computer engineering (CE, CoE, or CpE) is a branch of engineering specialized in developing computer hardware and software. It integrates several fields of electrical engineering, electronics engineering and computer science. Computer engi ...

, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of

bus snooping Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache (a snoopy cache) monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. This schem ...

. Bus snooping methods scale poorly due to the use of

broadcasting Broadcasting is the data distribution, distribution of sound, audio audiovisual content to dispersed audiences via a electronic medium (communication), mass communications medium, typically one using the electromagnetic spectrum (radio waves), ...

. These methods can be used to target both

performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...

and

scalability Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...

of directory systems.

Full bit vector format

In the full bit vector format, for each possible

cache line A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

, a

bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...

is used to track whether every individual processor has that line stored in its

cache Cache, caching, or caché may refer to: Science and technology * Cache (computing), a technique used in computer storage for easier data access * Cache (biology) or hoarding, a food storing behavior of animals * Cache (archaeology), artifacts p ...

. The full bit vector format is the simplest structure to implement, but the least scalable. The

SGI Origin 2000 The SGI Origin 2000 is a family of mid-range and high-end server computers developed and manufactured by Silicon Graphics (SGI). They were introduced in 1996 to succeed the SGI Challenge and POWER Challenge. At the time of introduction, these ...

uses a combination of full bit vector and coarse bit vector depending on the number of processors. Each directory entry must have 1 bit stored per processor per cache line, along with bits for tracking the state of the directory. This leads to the total size required being ''(number of processors)×number of cache lines'', having a storage overhead ratio of ''(number of processors)/(cache block size×8)''. It can be observed that directory overhead scales linearly with the number of processors. While this may be fine for a small number of processors, when implemented in large systems the size requirements for the directory becomes excessive. For example, with a block size of 32 bytes and 1024 processors, the storage overhead ratio becomes 1024/(32×8) = 400%.

Coarse bit vector format

The coarse bit vector format has a similar structure to the full bit vector format, though rather than tracking one bit per processor for every cache line, the directory groups several processors into nodes, storing whether a cache line is stored in a node rather than a processor. This improves size requirements at the expense of

bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a motor vehicle that carries significantly more passengers than an average car or van, but fewer than the average rail transport. It is most commonly used ...

traffic saving (processors per node - 1)×(total lines) bits of space. Thus the ratio overhead is the same, just replacing number of processors with number of processor groups. When a bus request is made for a cache line that one processor in the group has, the directory broadcasts the signal into every processor in the node rather than just the caches that contain it, leading to unnecessary traffic to nodes that do not have the data cached. In this case the directory entry uses 1 bit for a group of processors for each cache line. For the same example as Full Bit Vector format if we consider 1 bit for 8 processors as a group, then the storage overhead will be 128/(32×8)=50%. This is a significant improvement over the Full Bit Vector format.

Sparse directory format

A cache only stores a small subset of blocks in main memory at a particular time. Hence most of the entries in the directory will belong to uncached blocks. In the sparse directory format the wastage is reduced by storing only the cached blocks in the directory. Consider a processor with a cache size of 64KB with a block size of 32 bytes and the main memory size to be 4MB. The maximum number of entries that the directory can have in the sparse directory format is 2048. If the directory has an entry for all the blocks in the memory the number of entries in the directory will be 131072. Thus it is evident that the storage improvement provided by sparse directory format is very significant.

Number-balanced binary tree format

In this format the directory is decentralised and distributed among the caches that share a memory block. Different caches that share a memory block are arranged in the form of a

binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...

. The cache that accesses a memory block first is the

root node In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be co ...

. Each memory block has the root node information (HEAD) and Sharing counter field (SC). The SC field has the number of caches that share the block. Each cache entry has

pointers Pointer may refer to: People with the name * Pointer (surname), a surname (including a list of people with the name) * Pointer Williams (born 1974), American former basketball player Arts, entertainment, and media * ''Pointer'' (journal), the ...

to the next sharing caches known as L-CHD and R-CHD. A condition for this directory is that the binary tree should be number balanced, i.e the number of nodes in the left sub tree must be equal to or one greater than the number of nodes in the right subtree. All the subtrees should also be number balanced.

Chained directory format

In this format the memory holds the directory pointer to the latest cache that accessed the block and each cache has the pointer to the previous cache that accessed the block. So when a processor sends a write request to a block in memory, the processor sends invalidations down the chain of pointers. In this directory when a cache block is replaced we need to traverse the

list A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...

in order to change the directory which increases latency. In order to prevent this

doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains three fields: two link fields (references to the previous and to the next node in the se ...

s are widely used now in which each cached copy has pointers to previous and the next cache that accesses the block.

Limited pointer format

The limited pointer format uses a set number of pointers to track the processors that are caching the data. When a new processor caches a block, a free pointer is chosen from a pool to point to that processor. There are a few options for handling cases when the number of sharers exceeds the number of free pointers. One method is to invalidate one of the sharers, using its pointer for the new requestor, though this can be costly in cases where a block has a large number of readers, such as a lock. Another method is to have a separate pool of free pointers available to all the blocks. This method is usually effective as the number of blocks shared by a large number of processors is not normally very large.

References

{{Reflist Computer architecture