computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...

, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a

central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

(CPU) or

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

to enforce an

ordering Order, ORDER or Orders may refer to: * A socio-political or established or existing order, e.g. World order, Ancien Regime, Pax Britannica * Categorization, the process in which ideas and objects are recognized, differentiated, and understood * H ...

constraint on

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier. Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in

out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...

. This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behavior in concurrent programs and

device driver In the context of an operating system, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabli ...

s unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints. Memory barriers are typically used when implementing low-level

machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

that operates on memory shared by multiple devices. Such code includes

synchronization Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...

primitives and

lock-free In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking i ...

data structures on

multiprocessor Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...

systems, and device drivers that communicate with

computer hardware Computer hardware includes the physical parts of a computer, such as the central processing unit (CPU), random-access memory (RAM), motherboard, computer data storage, graphics card, sound card, and computer case. It includes external devices ...

Example

When a program runs on a single-CPU machine, the hardware performs the necessary bookkeeping to ensure that the program executes as if all memory operations were performed in the order specified by the programmer (program order), so memory barriers are not necessary. However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or memory-mapped peripherals, out-of-order access may affect program behavior. For example, a second CPU may see memory changes made by the first CPU in a sequence that differs from program order. A program is run via a process which can be multi-threaded (i.e. a software thread such as

pthreads In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that ov ...

as opposed to a hardware thread). Different processes do not share a memory space so this discussion does not apply to two programs, each one running in a different process (hence a different memory space). It applies to two or more (software) threads running in a single process (i.e. a single memory space where multiple software threads share a single memory space). Multiple software threads, within a single process, may run concurrently on a

multi-core processor A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...

. The following multi-threaded program, running on a multi-core processor gives an example of how such out-of-order execution can affect program behavior: Initially, memory locations x and f both hold the value 0. The software thread running on processor #1 loops while the value of f is zero, then it prints the value of x. The software thread running on processor #2 stores the value 42 into x and then stores the value 1 into f. Pseudo-code for the two program fragments is shown below. The steps of the program correspond to individual processor instructions. In the case of the PowerPC processor, the eieio instruction ensures, as memory fence, that any load or store operations previously initiated by the processor are fully completed with respect to the main memory before any subsequent load or store operations initiated by the processor access the main memory. In the case of the

ARM architecture family ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Holdings develops the ISAs and lic ...

, the DMB, DSB and ISB instructions are used. Thread #1 Core #1: while (f

0); // Memory fence required here print x; Thread #2 Core #2: x = 42; // Memory fence required here f = 1; One might expect the print statement to always print the number "42"; however, if thread #2's store operations are executed out-of-order, it is possible for f to be updated x, and the print statement might therefore print "0". Similarly, thread #1's load operations may be executed out-of-order and it is possible for x to be read f is checked, and again the print statement might therefore print an unexpected value. For most programs neither of these situations is acceptable. A memory barrier must be inserted before thread #2's assignment to f to ensure that the new value of x is visible to other processors at or prior to the change in the value of f. Another important point is a memory barrier must also be inserted before thread #1's access to x to ensure the value of x is not read prior to seeing the change in the value of f. Another example is when a driver performs the following sequence: prepare data for a hardware module // Memory fence required here trigger the hardware module to process the data If the processor's store operations are executed out-of-order, the hardware module may be triggered before data is ready in memory. For another illustrative example (a non-trivial one that arises in actual practice), see

double-checked locking In software engineering, double-checked locking (also known as "double-checked locking optimization") is a software design pattern used to reduce the overhead of acquiring a lock by testing the locking criterion (the "lock hint") before acquiring t ...

Multithreaded programming and memory visibility

Multithreaded programs usually use synchronization primitives provided by a high-level programming environment—such as

Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...

.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...

—or an

application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...

(API) such as POSIX Threads or

Windows API The Windows API, informally WinAPI, is the foundational application programming interface (API) that allows a computer program to access the features of the Microsoft Windows operating system in which the program is running. Programs can acces ...

. Synchronization primitives such as mutexes and semaphores are provided to synchronize access to resources from parallel threads of execution. These primitives are usually implemented with the memory barriers required to provide the expected memory visibility

semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

. In such environments explicit use of memory barriers is not generally necessary.

Out-of-order execution versus compiler reordering optimizations

Memory barrier instructions address reordering effects only at the hardware level. Compilers may also reorder instructions as part of the

program optimization In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be op ...

process. Although the effects on parallel program behavior can be similar in both cases, in general it is necessary to take separate measures to inhibit compiler reordering optimizations for data that may be shared by multiple threads of execution. In C and C++, the keyword was intended to allow C and C++ programs to directly access memory-mapped I/O. Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified with no omissions. Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by memory-mapped I/O. A C or C++ compiler may not omit reads from and writes to volatile memory locations, nor may it reorder read/writes relative to other such actions for the same volatile location (variable). The keyword to enforce cache-consistency. Therefore, the use of alone is not sufficient to use a variable for inter-thread communication on all systems and processors. The C and C++ standards prior to C11 and C++11 do not address multiple threads (or multiple processors), and as such, the usefulness of depends on the compiler and hardware. Although guarantees that the volatile reads and volatile writes will happen in the exact order specified in the source code, the compiler may generate code (or the CPU may re-order execution) such that a volatile read or write is reordered with regard to non-volatile reads or writes, thus limiting its usefulness as an inter-thread flag or mutex.

References

External links

Memory Barriers: a Hardware View for Software Hackers

Linux kernel memory barrier issues on multiple types of CPUs

Documentation on memory barriers in the Linux kernel

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

User-space RCU: Memory-barrier menagerie
{{Linux kernel Computer memory Consistency models Instruction processing

Example

Multithreaded programming and memory visibility

Out-of-order execution versus compiler reordering optimizations

See also

References

External links