computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

is called non-blocking if failure or suspension of any

thread Thread may refer to: Objects * Thread (yarn), a kind of thin yarn used for sewing ** Thread (unit of measurement), a cotton yarn measure * Screw thread, a helical ridge on a cylindrical fastener Arts and entertainment * ''Thread'' (film), 2016 ...

cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide

progress Progress is the movement towards a refined, improved, or otherwise desired state. In the context of progressivism, it refers to the proposition that advancements in technology, science, and social organization have resulted, and by extension w ...

, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003. The word "non-blocking" was traditionally used to describe

telecommunications network A telecommunications network is a group of nodes interconnected by telecommunications links that are used to exchange messages between the nodes. The links may use a variety of technologies based on the methodologies of circuit switching, mes ...

s that could route a connection through a set of relays "without having to re-arrange existing calls" (see

Clos network In the field of telecommunications, a Clos network is a kind of multistage circuit-switching network which represents a theoretical idealization of practical, multistage switching systems. It was invented by Edson Erwin in 1938 and first formalize ...

). Also, if the telephone exchange "is not defective, it can always make the connection" (see nonblocking minimal spanning switch).

Motivation

The traditional approach to multi-threaded programming is to use locks to synchronize access to shared

resources Resource refers to all the materials available in our environment which are technologically accessible, economically feasible and culturally sustainable and help us to satisfy our needs and wants. Resources can broadly be classified upon their ...

. Synchronization primitives such as mutexes, semaphores, and

critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way ...

s are all mechanisms by which a programmer can ensure that certain sections of code do not execute concurrently, if doing so would corrupt shared memory structures. If one thread attempts to acquire a lock that is already held by another thread, the thread will block until the lock is free. Blocking a thread can be undesirable for many reasons. An obvious reason is that while the thread is blocked, it cannot accomplish anything: if the blocked thread had been performing a high-priority or real-time task, it would be highly undesirable to halt its progress. Other problems are less obvious. For example, certain interactions between locks can lead to error conditions such as

deadlock In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a lo ...

, livelock, and priority inversion. Using locks also involves a trade-off between coarse-grained locking, which can significantly reduce opportunities for parallelism, and fine-grained locking, which requires more careful design, increases locking overhead and is more prone to bugs. Unlike blocking algorithms, non-blocking algorithms do not suffer from these downsides, and in addition are safe for use in

interrupt handler In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, softw ...

s: even though the preempted thread cannot be resumed, progress is still possible without it. In contrast, global data structures protected by mutual exclusion cannot safely be accessed in an interrupt handler, as the preempted thread may be the one holding the lock—but this can be rectified easily by masking the interrupt request during the critical section. A lock-free data structure can be used to improve performance. A lock-free data structure increases the amount of time spent in parallel execution rather than serial execution, improving performance on a

multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (suc ...

, because access to the shared data structure does not need to be serialized to stay coherent.

Implementation

With few exceptions, non-blocking algorithms use atomic read-modify-write primitives that the hardware must provide, the most notable of which is compare and swap (CAS).

Critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way ...

s are almost always implemented using standard interfaces over these primitives (in the general case, critical sections will be blocking, even when implemented with these primitives). In the 1990s all non-blocking algorithms had to be written "natively" with the underlying primitives to achieve acceptable performance. However, the emerging field of software transactional memory promises standard abstractions for writing efficient non-blocking code. Much research has also been done in providing basic

data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...

s such as stacks, queues, sets, and

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...

s. These allow programs to easily exchange data between threads asynchronously. Additionally, some non-blocking data structures are weak enough to be implemented without special atomic primitives. These exceptions include: * a single-reader single-writer ring buffer FIFO, with a size which evenly divides the overflow of one of the available unsigned integer types, can unconditionally be implemented safely using only a memory barrier * Read-copy-update with a single writer and any number of readers. (The readers are wait-free; the writer is usually lock-free, until it needs to reclaim memory). * Read-copy-update with multiple writers and any number of readers. (The readers are wait-free; multiple writers generally serialize with a lock and are not obstruction-free). Several libraries internally use lock-free techniques, but it is difficult to write lock-free code that is correct.Herb Sutter. Herb Sutter. Non-blocking algorithms generally involve a series of read, read-modify-write, and write instructions in a carefully designed order. Optimizing compilers can aggressively re-arrange operations. Even when they don't, many modern CPUs often re-arrange such operations (they have a "weak consistency model"), unless a memory barrier is used to tell the CPU not to reorder.

C++11 C++11 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versio ...

programmers can use std::atomic in <atomic>, and

C11 C11, C.XI, C-11 or C.11 may refer to: Transport * C-11 Fleetster, a 1920s American light transport aircraft for use of the United States Assistant Secretary of War * Fokker C.XI, a 1935 Dutch reconnaissance seaplane * LET C-11, a license-build var ...

programmers can use <stdatomic.h>, both of which supply types and functions that tell the compiler not to re-arrange such instructions, and to insert the appropriate memory barriers. Bruce Dawson
"ARM and Lock-Free Programming"

Wait-freedom

Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with

starvation Starvation is a severe deficiency in caloric energy intake, below the level needed to maintain an organism's life. It is the most extreme form of malnutrition. In humans, prolonged starvation can cause permanent organ damage and eventually, dea ...

-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes. Anthony Williams
"Safety: off: How not to shoot yourself in the foot with C++ atomics"
2015. p. 20. This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high. It was shown in the 1980s that all algorithms can be implemented wait-free, and many transformations from serial code, called ''universal constructions'', have been demonstrated. However, the resulting performance does not in general match even naïve blocking designs. Several papers have since improved the performance of universal constructions, but still, their performance is far below blocking designs. Several papers have investigated the difficulty of creating wait-free algorithms. For example, it has been shown that the widely available atomic ''conditional'' primitives,

CAS Cas may refer to: * Caș, a type of cheese made in Romania * ' (1886–) Czech magazine associated with Tomáš Garrigue Masaryk * '' Čas'' (19 April 1945–February 1948), the official, daily newspaper of the Democratic Party of Slovakia * ''CA ...

and

LL/SC In computer science, load-linked/store-conditional (LL/SC), sometimes known as load-reserved/store-conditional (LR/SC), are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory l ...

, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. But in practice these lower bounds do not present a real barrier as spending a cache line or exclusive reservation granule (up to 2 KB on ARM) of store per thread in the shared memory is not considered too costly for practical systems (typically the amount of store logically required is a word, but physically CAS operations on the same cache line will collide, and LL/SC operations in the same exclusive reservation granule will collide, so the amount of store physically required is greater). Wait-free algorithms were rare until 2011, both in research and in practice. However, in 2011 Kogan and Petrank presented a wait-free queue building on the

primitive, generally available on common hardware. Their construction expanded the lock-free queue of Michael and Scott, which is an efficient queue often used in practice. A follow-up paper by Kogan and Petrank provided a method for making wait-free algorithms fast and used this method to make the wait-free queue practically as fast as its lock-free counterpart. A subsequent paper by Timnat and Petrank provided an automatic mechanism for generating wait-free data structures from lock-free ones. Thus, wait-free implementations are now available for many data-structures.

Lock-freedom

Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently l