
A
concurrent
Concurrent means happening at the same time. Concurrency, concurrent, or concurrence may refer to:
Law
* Concurrence, in jurisprudence, the need to prove both ''actus reus'' and ''mens rea''
* Concurring opinion (also called a "concurrence"), a ...
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
(concurrent hash map) is an implementation of hash tables allowing ''concurrent access'' by ''multiple''
threads
Thread may refer to:
Objects
* Thread (yarn), a kind of thin yarn used for sewing
** Thread (unit of measurement), a cotton yarn measure
* Screw thread, a helical ridge on a cylindrical fastener
Arts and entertainment
* ''Thread'' (film), 2016 ...
using a
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
.
[
][
]
Concurrent hash tables represent a key
concurrent data structure
In computer science, a concurrent data structure is a
particular way of storing and organizing data for access by
multiple computing threads (or processes) on a computer.
Historically, such data structures were used on uniprocessor
machines w ...
for use in
concurrent computing
Concurrent computing is a form of computing in which several computations are executed '' concurrently''—during overlapping time periods—instead of ''sequentially—''with one completing before the next starts.
This is a property of a sys ...
which allow multiple threads to more efficiently cooperate for a computation among shared data.
Due to the natural problems associated with concurrent access - namely
contention - the way and scope in which the table can be concurrently accessed differs depending on the implementation. Furthermore, the resulting speed up might not be linear with the amount of threads used as contention needs to be resolved, producing processing
overhead.
There exist multiple solutions to mitigate the effects of contention, that each preserve the
correctness of operations on the table.
[
][
]
As with their sequential counterpart, concurrent hash tables can be generalized and extended to fit broader applications, such as allowing more complex data types to be used for keys and values. These generalizations can however negatively impact performance and should thus be chosen in accordance to the requirements of the application.
Concurrent hashing
When creating concurrent hash tables, the functions accessing the table with the chosen hashing algorithm need to be adapted for concurrency by adding a conflict resolution strategy. Such a strategy requires managing accesses in a way such that conflicts caused by them do not result in corrupt data, while ideally increasing their efficiency when used in parallel.
Herlihy and Shavit describe how the accesses to a hash table without such a strategy - in its example based on a basic implementation of the
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick p ...
algorithm - can be adapted for concurrent use. Fan et al.
[
]
further describe a table access scheme based on cuckoo hashing that is not only concurrent, but also keeps the space efficiency of its hashing function while also improving cache locality as well as the throughput of insertions.
When hash tables are not bound in size and are thus allowed to grow/shrink when necessary, the hashing algorithm needs to be adapted to allow this operation. This entails modifying the used hash function to reflect the new key-space of the resized table. A concurrent growing algorithm is described by Maier et al.
Mega-KV is a high performance key-value store system, where the
cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick p ...
is used and the KV indexing is massively parallelized in batch mode by GPU. With further optimizations of GPU acceleration by
NVIDIA
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and
Oak Ridge National Lab
Oak Ridge National Laboratory (ORNL) is a U.S. multiprogram science and technology national laboratory sponsored by the U.S. Department of Energy (DOE) and administered, managed, and operated by UT–Battelle as a federally funded research and ...
, Mega-KV was pushed to another high record of the throughput in 2018 (up to 888 millions of key-value operations per second).
Contention handling

As with any concurrent data structure, concurrent hash tables suffer from a variety of problems known in the field of concurrent computing as a result of contention.
Examples for such are the
ABA problem
In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and "value is the same" is used to indicate "nothing has changed". However, another thread can execute be ...
,
race condition
A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of t ...
s, and
deadlock
In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a lo ...
s.
The extent in which these problems manifest or even occur at all depends on the implementation of the concurrent hash table; specifically which operations the table allows to be run concurrently, as well as its strategies for mitigating problems associated with contention. When handling contention, the main goal is the same as with any other concurrent data structure, namely ensuring correctness for every operation on the table. At the same time, it should naturally be done in such a way as to be more efficient than a sequential solution when used concurrently. This is also known as
concurrency control
In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while ...
.
Atomic instructions
Using
atomic instructions such as
compare-and-swap
In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of th ...
or
fetch-and-add In computer science, the fetch-and-add CPU instruction (FAA) atomically increments the contents of a memory location by a specified value.
That is, fetch-and-add performs the operation
:increment the value at address by , where is a memory locat ...
, problems caused by contention can be reduced by ensuring that an access is completed before another access has the chance to interfere. Operations such as compare-and-swap often present limitations as to what size of data they can handle, meaning that the types of keys and values of a table have to be chosen or converted accordingly.
Using so called Hardware
Transactional Memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database trans ...
(HTM), table operations can be thought of much like
database transaction
A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally repr ...
s,
ensuring atomicity. An example of HTM in practice are the
Transactional Synchronization Extensions
Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speeding ...
.
Locking
With the help of
locks, operations trying to concurrently access the table or values within it can be handled in a way that ensures correct behavior. This can however lead to negative performance impacts,
in particular when the locks used are too restrictive, thus blocking accesses that would otherwise not contend and could execute without causing any problems. Further considerations have to be made to avoid even more critical problems that threaten correctness, as with livelocks, deadlocks or
starvation
Starvation is a severe deficiency in caloric energy intake, below the level needed to maintain an organism's life. It is the most extreme form of malnutrition. In humans, prolonged starvation can cause permanent organ damage and eventually, dea ...
.
Phase concurrency

A phase concurrent hash table groups accesses by creating phases in which only one type of operation is allowed (i.e. a pure write-phase), followed by a
synchronization of the table state across all threads. A formally proven algorithm for this is given by Shun and Blelloch.
Read-Copy-Update
Widely used within the
Linux kernel
The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ...
,
read-copy-update In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structu ...
(RCU) is especially useful in cases where the number of reads far exceeds the number of writes.
Applications
Naturally, concurrent hash tables find application wherever sequential hash tables are useful. The advantage that concurrency delivers herein lies within the potential speedup of these use-cases, as well as the increased scalability.
Considering hardware such as
multi-core processor
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (suc ...
s that become increasingly more capable of concurrent computation, the importance of concurrent data structures within these applications grow steadily.
Performance Analysis
Maier et al.
perform a thorough analysis on a variety of concurrent hash table implementations, giving insight into the effectiveness of each in different situations that are likely to occur in real use-cases. The most important findings can be summed up as the following:
As expected low contention leads to positive behavior across every operation, whereas high contention becomes problematic when it comes to writing.
The latter however is a problem of high contention in general, wherein the benefit of concurrent computation is negated due to the natural requirement for concurrency control restricting contending accesses. The resulting overhead causes worse performance than that of the ideal sequential version.
In spite of this, concurrent hash tables still prove invaluable even in such high contention scenarios when observing that a well-designed implementation can still achieve very high speedups by leveraging the benefits of concurrency to read data concurrently.
However, real use-cases of concurrent hash tables are often not simply sequences of the same operation, but rather a mixture of multiple types.
As such, when a mixture of
insert
and
find
operations is used the speedup and resulting usefulness of concurrent hash tables become more obvious, especially when observing
find
heavy workloads.
Ultimately the resulting performance of a concurrent hash table depends on a variety of factors based upon its desired application. When choosing the implementation, it is important to determine the necessary amount of generality, contention handling strategies and some thoughts on whether the size of the desired table can be determined in advance or a growing approach must be used instead.
Implementations
* Since
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
1.5, concurrent hash maps are provided based upon
concurrent map interface.
* libcuckoo provides concurrent hash tables for
C/
C++ allowing concurrent reads and writes. The library is available on GitHub.
*
Threading Building Blocks
oneAPI Threading Building Blocks (oneTBB; formerly Threading Building Blocks or TBB), is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can r ...
provide concurrent unordered maps for C++ which allow concurrent insertion and traversal and are kept in a similar style to the
C++11
C++11 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versio ...
std::unordered_map
interface. Included within are the concurrent unordered multimaps, which allow multiple values to exist for the same key in a concurrent unordered map. Additionally, concurrent hash maps are provided which build upon the concurrent unordered map and further allow concurrent erasure and contain built-in locking.
* growt provides concurrent growing hash tables for C++ on the basis of the so-called ''folklore'' implementation. Based on this non-growing implementation, a variety of different growing hash tables is given. These implementations allow for concurrent reads, inserts, updates (notably updating values based on the current value at the key) and removals (based upon updating using
tombstones
A headstone, tombstone, or gravestone is a stele or marker, usually stone, that is placed over a grave. It is traditional for burials in the Christian, Jewish, and Muslim religions, among others. In most cases, it has the deceased's name, ...
). Beyond that, variants on the basis of
Intel TSX
Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speedin ...
are provided. The library is available on GitHub.
* folly provides concurrent hash tables for
C++14
C14, C.XIV or C-14 may be:
* Autovía C-14, a highway in Catalonia in Spain
* Fokker C.XIV, a 1937 Dutch reconnaissance seaplane
* , a 1908 British C-class submarine
* LSWR C14 class, a London and South Western Railway locomotive
* Ramal C-14, th ...
and later ensuring wait-free readers and lock-based,
sharded writers. As stated on its GitHub page, this library provides useful functionality for
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
.
* Junction provides several implementations of concurrent hash tables for C++ on the basis of atomic operations to ensure thread-safety for any given member function of the table. The library is available on GitHub.
GitHub repository for Junction
/ref>
See also
* Parallel computing
* Liveness Properties of an execution of a computer program —particularly for concurrent and distributed systems— have long been formulated by giving ''safety properties'' ("bad things don't happen") and ''liveness properties'' ("good things do happen").
...
* Ctrie
A concurrent hash-trie or CtrieProkopec, A. et al. (2011Cache-Aware Lock-Free Concurrent Hash Tries Technical Report, 2011.Prokopec, A., Bronson N., Bagwell P., Odersky M. (2012Concurrent Tries with Efficient Non-Blocking Snapshots/ref> is a concu ...
References
Further reading
* {{cite book , last1=Herlihy , first1=Maurice , last2=Shavit , first2=Nir , title=The Art of Multiprocessor Programming , year=2008 , publisher=Morgan Kaufmann Publishers Inc. , location=San Francisco, CA, USA , isbn=978-0-12-370591-4 , pages=299–328 , chapter=Chapter 13: Concurrent Hashing and Natural Parallelism
Hash based data structures
Concurrent computing