HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, false sharing is a performance-degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism. When a system participant attempts to periodically access data that is not being altered by another party, but that data shares a cache block with data that ''is'' being altered, the caching protocol may force the first participant to reload the whole cache block despite a lack of logical necessity. The caching system is unaware of activity within this block and forces the first participant to bear the caching system overhead required by true shared access of a resource.


Multiprocessor CPU caches

By far the most common usage of this term is in modern
multiprocessor Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
s, where
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
is cached in lines of some small
power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number 2, two as the Base (exponentiation), base and integer  as the exponent. In the fast-growing hierarchy, is exactly equal to f_1^ ...
word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
size (e.g., 64 aligned, contiguous
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s). If two processors operate on independent data in the same
memory address In computing, a memory address is a reference to a specific memory location in memory used by both software and hardware. These addresses are fixed-length sequences of digits, typically displayed and handled as unsigned integers. This numeric ...
region storable in a single line, the cache coherency mechanisms in the system may force the whole line across the
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a motor vehicle that carries significantly more passengers than an average car or van, but fewer than the average rail transport. It is most commonly used ...
or interconnect with every data write, forcing memory stalls in addition to wasting system
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
. In some cases, the elimination of false sharing can result in order-of-magnitude performance improvements. False sharing is an inherent artifact of automatically synchronized cache protocols and can also exist in environments such as distributed file systems or databases, but current prevalence is limited to RAM caches.


Example

#include #include #include #include #include #include #include using namespace std; using namespace chrono; #if defined(__cpp_lib_hardware_interference_size) // default cacheline size from runtime constexpr size_t CL_SIZE = hardware_constructive_interference_size; #else // most common cacheline size otherwise constexpr size_t CL_SIZE = 64; #endif int main() This code shows the effect of false sharing. It creates an increasing number of threads from one thread to the number of physical threads in the system. Each thread sequentially increments one byte of a cache line, which as a whole is shared among all threads. The higher the level of contention between threads, the longer each increment takes. This are the results on a Zen4 system with 16 cores and 32 threads:
1: 1
2: 4
3: 6
4: 9
5: 11
6: 13
7: 15
8: 17
9: 16
10: 18
11: 21
12: 25
13: 29
14: 35
15: 39
16: 41
17: 43
18: 44
19: 48
20: 49
21: 51
22: 53
23: 58
24: 61
25: 68
26: 75
27: 79
28: 82
29: 85
30: 88
31: 91
32: 94
As you can see, on the system in question it can take up to a 100 nanoseconds to complete an increment operation on the shared cache line, which corresponds to approx. 420 clock cycles on this CPU.


Mitigation

There are ways of mitigating the effects of false sharing. For instance, false sharing in CPU caches can be prevented by reordering variables or adding
padding Padding is thin cushioned material sometimes added to clothes. Padding may also be referred to as batting or wadding when used as a layer in lining quilts or as a packaging or stuffing material. When padding is used in clothes, it is often done in ...
(unused bytes) between variables. However, some of these program changes may increase the size of the objects, leading to higher memory use. Compile-time data transformations can also mitigate false-sharing. However, some of these transformations may not always be allowed. For instance, the C++ programming language standard draft of C++23 mandates that data members must be laid out so that later members have higher addresses. There are tools for detecting false sharing. There are also systems that both detect and repair false sharing in executing programs. However, these systems incur some execution overhead.


References

{{reflist


External links


Easy Understanding on False Sharing



Dr Dobbs article: Eliminate False Sharing


Cache coherency Computer memory