HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with ''finite'' domain. It supports 'lookup', 'remove', and 'insert' operations. The dictionary problem is the classic problem of designing efficient
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s that implement associative arrays. The two major solutions to the dictionary problem are
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s and search trees..Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994
"Dynamic Perfect Hashing: Upper and Lower Bounds"
. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370
It is sometimes also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures. Many programming languages include associative arrays as primitive data types, while many other languages provide software libraries that support associative arrays. Content-addressable memory is a form of direct hardware-level support for associative arrays. Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern., pp. 597–599. The name does not come from the
associative property In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a Validity (logic), valid rule of replaceme ...
known in mathematics. Rather, it arises from the association of values with keys. It is not to be confused with associative processors.


Operations

In an associative array, the association between a key and a value is often known as a "mapping"; the same word may also be used to refer to the process of creating a new association. The operations that are usually defined for an associative array are: ;Insert or put : add a new (key, value) pair to the collection, mapping the key to its new value. Any existing mapping is overwritten. The arguments to this operation are the key and the value. ;Remove or delete : remove a (key, value) pair from the collection, unmapping a given key from its value. The argument to this operation is the key. ;Lookup, find, or get : find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some lookup functions raise an exception, while others return a default value (such as zero, null, or a specific value passed to the constructor). Associative arrays may also include other operations such as determining the number of mappings or constructing an iterator to loop over all the mappings. For such operations, the order in which the mappings are returned is usually implementation-defined. A multimap generalizes an associative array by allowing multiple values to be associated with a single key. A bidirectional map is a related abstract data type in which the mappings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as an argument and looks up the key associated with that value.


Properties

The operations of the associative array should satisfy various properties: * lookup(k, insert(j, v, D)) = if k

j then v else lookup(k, D)
* lookup(k, new()) = fail, where fail is an exception or default value * remove(k, insert(j, v, D)) = if k

j then remove(k, D) else insert(j, v, remove(k, D))
* remove(k, new()) = new() where k and j are keys, v is a value, D is an associative array, and new() creates a new, empty associative array.


Example

Suppose that the set of loans made by a library is represented in a data structure. Each book in a library may be checked out by one patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. Using notation from Python or
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
, the data structure would be: A lookup operation on the key "Great Expectations" would return "John". If John returns his book, that would cause a deletion operation, and if Pat checks out a book, that would cause an insertion operation, leading to a different state:


Implementation

For dictionaries with very few mappings, it may make sense to implement the dictionary using an association list, which is a
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...
of mappings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of mappings. However, it is easy to implement and the constant factors in its running time are small. Another very simple implementation technique, usable when the keys are restricted to a narrow range, is direct addressing into an array: the value for a given key ''k'' is stored at the array cell ''A'' 'k'' or if there is no mapping for ''k'' then the cell stores a special
sentinel value In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically ...
that indicates the lack of a mapping. This technique is simple and fast, with each dictionary operation taking constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small. The two major approaches for implementing dictionaries are a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
or a search tree.


Hash table implementations

The most frequently used general-purpose implementation of an associative array is with a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
: an array combined with a
hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
that separates each key into a separate "bucket" of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constant-time operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key's hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and usually outperform alternative implementations. Hash tables must be able to handle collisions: the mapping by the hash function of two different keys to the same bucket of the array. The two most widespread approaches to this problem are separate chaining and open addressing.. In separate chaining, the array does not store the value itself but stores a pointer to another container, usually an association list, that stores all the values matching the hash. By contrast, in open addressing, if a hash collision is found, the table seeks an empty spot in an array to store the value in a deterministic manner, usually by looking at the next immediate position in the array. Open addressing has a lower cache miss ratio than separate chaining when the table is mostly empty. However, as the table becomes filled with more elements, open addressing's performance degrades exponentially. Additionally, separate chaining uses less memory in most cases, unless the entries are very small (less than four times the size of a pointer).


Tree implementations


Self-balancing binary search trees

Another common approach is to implement an associative array with a self-balancing binary search tree, such as an AVL tree or a red–black tree. Compared to hash tables, these structures have both strengths and weaknesses. The worst-case performance of self-balancing binary search trees is significantly better than that of a hash table, with a time complexity in big O notation of O(log ''n''). This is in contrast to hash tables, whose worst-case performance involves all elements sharing a single bucket, resulting in O(''n'') time complexity. In addition, and like all binary search trees, self-balancing binary search trees keep their elements in order. Thus, traversing its elements follows a least-to-greatest pattern, whereas traversing a hash table can result in elements being in seemingly random order. Because they are in order, tree-based maps can also satisfy range queries (find all values between two bounds) whereas a hashmap can only find exact values. However, hash tables have a much better average-case time complexity than self-balancing binary search trees of O(1), and their worst-case performance is highly unlikely when a good
hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
is used. A self-balancing binary search tree can be used to implement the buckets for a hash table that uses separate chaining. This allows for average-case constant lookup, but assures a worst-case performance of O(log ''n''). However, this introduces extra complexity into the implementation and may cause even worse performance for smaller hash tables, where the time spent inserting into and balancing the tree is greater than the time needed to perform a linear search on all elements of a linked list or similar data structure.


Other trees

Associative arrays may also be stored in unbalanced binary search trees or in data structures specialized to a particular type of keys such as radix trees,
trie In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store t ...
s,
Judy array In computer science, a Judy array is an early-2000s Hewlett-Packard hand-optimized implementation of a 256-ary radix tree that uses many situational node types to reduce latency from CPU cache-line fills.Alan Silverstein,Judy IV Shop Manual, 2002 ...
s, or van Emde Boas trees, though the relative performance of these implementations varies. For instance, Judy trees have been found to perform less efficiently than hash tables, while carefully selected hash tables generally perform more efficiently than adaptive radix trees, with potentially greater restrictions on the data types they can handle. The advantages of these alternative structures come from their ability to handle additional associative array operations, such as finding the mapping whose key is the closest to a queried key when the query is absent in the set of mappings.


Comparison


Ordered dictionary

The basic definition of a dictionary does not mandate an order. To guarantee a fixed order of enumeration, ordered versions of the associative array are often used. There are two senses of an ordered dictionary: * The order of enumeration is always deterministic for a given set of keys by sorting. This is the case for tree-based implementations, one representative being the container of C++. * The order of enumeration is key-independent and is instead based on the order of insertion. This is the case for the "ordered dictionary" in .NET Framework, the LinkedHashMap of
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and Python. The latter is more common. Such ordered dictionaries can be implemented using an association list, by overlaying a doubly linked list on top of a normal dictionary, or by moving the actual data out of the sparse (unordered) array and into a dense insertion-ordered one.


Language support

Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting. Built-in syntactic support for associative arrays was introduced in 1969 by SNOBOL4, under the name "table". TMG offered tables with string keys and integer values.
MUMPS MUMPS ("Massachusetts General Hospital Utility Multi-Programming System"), or M, is an imperative, high-level programming language with an integrated transaction processing key–value database. It was originally developed at Massachusetts Gen ...
made multi-dimensional associative arrays, optionally persistent, its key data structure. SETL supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with AWK and including
Rexx Rexx (restructured extended executor) is a high-level programming language developed at IBM by Mike Cowlishaw. Both proprietary and open-source software, open source Rexx interpreter (computing), interpreters exist for a wide range of comput ...
,
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
, PHP, Tcl,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
, Maple, Python,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, Wolfram Language, Go, and Lua, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax. In
Smalltalk Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learni ...
,
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
, Python, REALbasic, Swift, VBA and
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), was an ancient sacred precinct and the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient Classical antiquity, classical world. The A ...
they are called ''dictionaries''; in
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
and Seed7 they are called ''hashes''; in C++, C#,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, Go,
Clojure Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
, Scala,
OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
,
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
they are called ''maps'' (see map (C++), unordered_map (C++), and ); in
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
and Windows PowerShell, they are called ''hash tables'' (since both typically use this implementation); in Maple and Lua, they are called ''tables''. In PHP and R, all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript (see also
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
), all objects behave as associative arrays with string-valued keys, while the Map and WeakMap types take arbitrary objects as keys. In Lua, they are used as the primitive building block for all data structures. In
Visual FoxPro Visual FoxPro is a programming language that was developed by Microsoft. It is a data-centric and procedural programming language with object-oriented programming (OOP) features. It was derived from FoxPro (which was itself descended from FoxB ...
, they are called ''Collections''. The D language also supports associative arrays.


Permanent storage

Many programs using associative arrays will need to store that data in a more permanent form, such as a
computer file A computer file is a System resource, resource for recording Data (computing), data on a Computer data storage, computer storage device, primarily identified by its filename. Just as words can be written on paper, so too can data be written to a ...
. A common solution to this problem is a generalized concept known as ''archiving'' or ''
serialization In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
'', which produces a text or binary representation of the original objects that can be written directly to a file. This is most commonly implemented in the underlying object model, like .Net or Cocoa, which includes standard functions that convert the internal data into text. The program can create a complete text representation of any group of objects by calling these methods, which are almost always already implemented in the base associative array class."Archives and Serializations Programming Guide"
Apple Inc., 2012
For programs that use very large data sets, this sort of individual file storage is not appropriate, and a
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
(DB) is required. Some DB systems natively store associative arrays by serializing the data and then storing that serialized data and the key. Individual arrays can then be loaded or saved from the database using the key to refer to them. These key–value stores have been used for many years and have a history as long as that of the more common
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
(RDBs), but a lack of standardization, among other reasons, limited their use to certain niche roles. RDBs were used for these roles in most cases, although saving objects to a RDB can be complicated, a problem known as object-relational impedance mismatch. After approximately 2010, the need for high-performance databases suitable for
cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
and more closely matching the internal structure of the programs using them led to a renaissance in the key–value store market. These systems can store and retrieve associative arrays in a native fashion, which can greatly improve performance in common web-related workflows.


See also

*
Tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
*
Function (mathematics) In mathematics, a function from a set (mathematics), set to a set assigns to each element of exactly one element of .; the words ''map'', ''mapping'', ''transformation'', ''correspondence'', and ''operator'' are sometimes used synonymously. ...


References


External links


NIST's Dictionary of Algorithms and Data Structures: Associative Array
{{Data types Abstract data types * Composite data types Data types