software engineering Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining Application software, software applications. It involves applying engineering design process, engineering principl ...

, a spinlock is a

lock Lock(s) or Locked may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainme ...

that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of

busy waiting In computer science and software engineering, busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. Spinning can also be use ...

. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on (the one that holds the lock) blocks or "goes to sleep". Because they avoid overhead from

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

process rescheduling or

context switch In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes ...

ing, spinlocks are efficient if threads are likely to be blocked for only short periods. For this reason, operating-system kernels often use spinlocks. However, spinlocks become wasteful if held for longer durations, as they may prevent other threads from running and require rescheduling. The longer a thread holds a lock, the greater the risk that the thread will be interrupted by the OS scheduler while holding the lock. If this happens, other threads will be left "spinning" (repeatedly trying to acquire the lock), while the thread holding the lock is not making progress towards releasing it. The result is an indefinite postponement until the thread holding the lock can finish and release it. This is especially true on a single-processor system, where each waiting thread of the same priority is likely to waste its quantum (allocated time where a thread can run) spinning until the thread that holds the lock is finally finished. Implementing spinlocks correctly is challenging because programmers must take into account the possibility of simultaneous access to the lock, which could cause

race condition A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent ...

s. Generally, such an implementation is possible only with special

assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

instructions, such as atomic (i.e. un-interruptible)

test-and-set In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic (i.e., non- interruptible) operation. The caller can then "test" the result to see if the ...

operations and cannot be easily implemented in programming languages not supporting truly atomic operations. On architectures without such operations, or if high-level language implementation is required, a non-atomic locking algorithm may be used, e.g.

Peterson's algorithm Peterson's algorithm (or Peterson's solution) is a concurrent programming algorithm for mutual exclusion that allows two or more processes to share a single-use resource without conflict, using only shared memory for communication. It was formulate ...

. However, such an implementation may require more

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

than a spinlock, be slower to allow progress after unlocking, and may not be implementable in a high-level language if

out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...

is allowed.

Example implementation

The following example uses x86 assembly language to implement a spinlock. It will work on any

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...

80386 The Intel 386, originally released as the 80386 and later renamed i386, is the third-generation x86 architecture microprocessor from Intel. It was the first 32-bit processor in the line, making it a significant evolution in the x86 architect ...

compatible processor. ; Intel syntax locked: ; The lock variable. 1 = locked, 0 = unlocked. dd 0 spin_lock: mov eax, 1 ; Set the EAX register to 1. xchg eax, ocked ; Atomically swap the EAX register with ; the lock variable. ; This will always store 1 to the lock, leaving ; the previous value in the EAX register. test eax, eax ; Test EAX with itself. Among other things, this will ; set the processor's Zero Flag if EAX is 0. ; If EAX is 0, then the lock was unlocked and ; we just locked it. ; Otherwise, EAX is 1 and we didn't acquire the lock. jnz spin_lock ; Jump back to the MOV instruction if the Zero Flag is ; not set; the lock was previously locked, and so ; we need to spin until it becomes unlocked. ret ; The lock has been acquired, return to the calling ; function. spin_unlock: xor eax, eax ; Set the EAX register to 0. xchg eax, ocked ; Atomically swap the EAX register with ; the lock variable. ret ; The lock has been released.

Significant optimizations

The simple implementation above works on all CPUs using the x86 architecture. However, a number of performance optimizations are possible: On later implementations of the x86 architecture, ''spin_unlock'' can safely use an unlocked MOV instead of the slower locked XCHG. This is due to subtle

memory ordering Memory ordering is the order of accesses to computer memory by a CPU. Memory ordering depends on both the order of the instructions generated by the compiler at compile time and the execution order of the CPU at runtime. However, memory order is ...

rules which support this, even though MOV is not a full

memory barrier In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued ...

. However, some processors (some

Cyrix Cyrix Corporation was a microprocessor developer that was founded in 1988 in Richardson, Texas, as a specialist supplier of floating point units for 286 and 386 microprocessors. The company was founded by Tom Brightman and Jerry Rogers. Ter ...

processors, some revisions of the

Pentium Pro The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It implements the P6 (microarchitecture), P6 microarchitecture (sometimes termed i686), and was the first x86 Intel C ...

(due to bugs), and earlier

Pentium Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...

and

i486 The Intel 486, officially named i486 and also known as 80486, is a microprocessor introduced in 1989. It is a higher-performance follow-up to the i386, Intel 386. It represents the fourth generation of binary compatible CPUs following the Inte ...

SMP systems) will do the wrong thing and data protected by the lock could be corrupted. On most non-x86 architectures, explicit memory barrier or atomic instructions (as in the example) must be used. On some systems, such as

IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by ...

, there are special "unlock" instructions which provide the needed memory ordering. To reduce inter-CPU bus traffic, code trying to acquire a lock should loop reading without trying to write anything until it reads a changed value. Because of MESI caching protocols, this causes the cache line for the lock to become "Shared"; then there is remarkably ''no'' bus traffic while a CPU waits for the lock. This optimization is effective on all CPU architectures that have a cache per CPU, because MESI is so widespread. On Hyper-Threading CPUs, pausing with rep nop gives additional performance by hinting to the core that it can work on the other thread while the lock spins waiting.

Transactional Synchronization Extensions Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speeding ...

and other hardware

transactional memory In computer science and computer engineering, engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an linearizability, atomic way. It is a concurrency control ...

instruction sets serve to replace locks in most cases. Although locks are still required as a fallback, they have the potential to greatly improve performance by having the processor handle entire blocks of atomic operations. This feature is built into some mutex implementations, for example in

glibc The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also dir ...

. The Hardware Lock Elision (HLE) in x86 is a weakened but backwards-compatible version of TSE, and we can use it here for locking without losing any compatibility. In this particular case, the processor can choose to not lock until two threads actually conflict with each other. A simpler version of the test can use the cmpxchg instruction on x86, or the __sync_bool_compare_and_swap built into many Unix compilers. With the optimizations applied, a sample would look like: ; In C: while (!__sync_bool_compare_and_swap(&locked, 0, 1)) while (locked) __builtin_ia32_pause(); spin_lock: mov ecx, 1 ; Set the ECX register to 1. retry: xor eax, eax ; Zero out EAX, because cmpxchg compares against EAX. XACQUIRE lock cmpxchg ocked ecx ; atomically decide: if locked is zero, write ECX to it. ; XACQUIRE hints to the processor that we are acquiring a lock. je out ; If we locked it (old value equal to EAX: 0), return. pause: mov eax, ocked ; Read locked into EAX. test eax, eax ; Perform the zero-test as before. jz retry ; If it's zero, we can retry. rep nop ; Tell the CPU that we are waiting in a spinloop, so it can ; work on the other thread now. Also written as the "pause". jmp pause ; Keep check-pausing. out: ret ; All done. spin_unlock: XRELEASE mov ocked 0 ; Assuming the memory ordering rules apply, release the ; lock variable with a "lock release" hint. ret ; The lock has been released. On any multi-processor system that uses the MESI contention protocol, such a test-and-test-and-set lock (TTAS) performs much better than the simple test-and-set lock (TAS) approach. Maurice Herlihy and Nir Shavit. "The Art of Multiprocessor Programming"
"Spin Locks and Contention"
With large numbers of processors, adding a random

exponential backoff Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. These algorithms find usage in a wide range of systems and processes, with radio networ ...

delay before re-checking the lock performs even better than TTAS. A few multi-core processors have a "power-conscious spin-lock" instruction that puts a processor to sleep, then wakes it up on the next cycle after the lock is freed. A spin-lock using such instructions is more efficient and uses less energy than spin locks with or without a back-off loop.

Alternatives

The primary disadvantage of a spinlock is that, while waiting to acquire a lock, it wastes time that might be productively spent elsewhere. There are two ways to avoid this: # Do not acquire the lock. In many situations it is possible to design data structures that do not require locking, e.g. by using per-thread or per-CPU data and disabling

interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...

s. #

Switch In electrical engineering, a switch is an electrical component that can disconnect or connect the conducting path in an electrical circuit, interrupting the electric current or diverting it from one conductor to another. The most common type o ...

to a different thread while waiting. This typically involves attaching the current thread to a queue of threads waiting for the lock, followed by switching to another thread that is ready to do some useful work. This scheme also has the advantage that it guarantees that

resource starvation In computer science, resource starvation is a problem encountered in concurrent computing where a process is perpetually denied necessary resources ''Resource'' refers to all the materials available in our environment which are Technology, te ...

does not occur as long as all threads eventually relinquish locks they acquire and scheduling decisions can be made about which thread should progress first. Spinlocks that never entail switching, usable by

real-time operating systems A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. A RTOS is distinct from a time-sharing operating system, such as Unix ...

, are sometimes called ''raw spinlocks''. Most operating systems (including

Solaris Solaris is the Latin word for sun. It may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Sol ...

Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...

and

FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...

) use a hybrid approach called "adaptive

mutex In computer science, a lock or mutex (from mutual exclusion) is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, ...

". The idea is to use a spinlock when trying to access a resource locked by a currently-running thread, but to sleep if the thread is not currently running. (The latter is ''always'' the case on single-processor systems.)

OpenBSD OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...

attempted to replace spinlocks with ticket locks which enforced

first-in-first-out FIFO and LIFO accounting are methods used in managing inventory and financial matters involving the amount of money a company has to have tied up within inventory of produced goods, raw materials, parts, components, or feedstocks. They are used to ...

behaviour, however this resulted in more CPU usage in the kernel and larger applications, such as

Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...

, becoming much slower.{{cite web, url=https://lobste.rs/c/6cybxn, title=tedu comment on Locking in WebKit - Lobsters, date=2016-05-06, author=Ted Unangst

References

External links

pthread_spin_lock documentation
from The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition
Variety of spinlock Implementations
from Concurrency Kit *Article
User-Level Spin Locks - Threads, Processes & IPC
by Gert Boddaert *Articl
Spin Lock Example in Java
*Paper

by Thomas E. Anderson *Paper
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
by John M. Mellor-Crummey and Michael L. Scott. This paper received th
2006 Dijkstra Prize in Distributed Computing

Spin-Wait Lock
by Jeffrey Richter
Austria C++ SpinLock Class ReferenceInterlocked Variable Access(Windows)Operating Systems: Three Easy Pieces (Chapter: Locks)
Concurrency control algorithms Programming constructs

Example implementation

Significant optimizations

Alternatives

See also

References

External links