
In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, a hash table, also known as hash map, is a
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...
that implements an
associative array or dictionary. It is an
abstract data type that maps
keys to
values.
A hash table uses a
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash ''
collisions'' where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.
In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of
key–value pairs, at
amortized constant average cost per operation.
[ Charles E. Leiserson]
''Amortized Algorithms, Table Doubling, Potential Method''
Lecture 13, course MIT 6.046J/18.410J Introduction to Algorithms—Fall 2005
Hashing is an example of a
space-time tradeoff
In physics, spacetime is a mathematical model that combines the three dimensions of space and one dimension of time into a single four-dimensional manifold. Spacetime diagrams can be used to visualize relativistic effects, such as why differ ...
. If
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a
binary search or
linear search can be used to retrieve the element.
In many situations, hash tables turn out to be on average more efficient than
search trees or any other
table lookup structure. For this reason, they are widely used in many kinds of computer
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
, particularly for
associative arrays,
database indexing,
caches, and
sets.
History
The idea of hashing arose independently in different places. In January 1953,
Hans Peter Luhn wrote an internal
IBM memorandum that used hashing with chaining. Open addressing was later proposed by A. D. Linh building on Luhn's paper.
Around the same time,
Gene Amdahl,
Elaine M. McGraw
Elaine M. McGraw (née Boehme) was an American computer programmer who, together with Arthur Samuel and Gene Amdahl, invented open addressing based hash tables in 1954.
After studying economics, McGraw began working as a computer programmer for ...
,
Nathaniel Rochester, and
Arthur Samuel of
IBM Research
IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research or ...
implemented hashing for the
IBM 701 assembler. Open addressing with linear probing is credited to Amdahl, although
Ershov independently had the same idea.
The term "open addressing" was coined by
W. Wesley Peterson
William Wesley Peterson (April 22, 1924 – May 6, 2009) was an American mathematician and computer scientist. He was best known for designing the cyclic redundancy check (CRC),
– The original paper on CRCs
for which research he was awarded ...
on his article which discusses the problem of search in large files.
The first
published work on hashing with chaining is credited to
Arnold Dumey, who discussed the idea of using remainder module a prime as a hash function. The word "hashing" was first published by an article by Robert Morris. A
theoretical analysis of linear probing was submitted originally by Konheim and Weiss.
Overview
An
associative array stores a
set of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of
unique key In relational database management systems, a unique key is a candidate key that is not the primary key of the relation. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the pr ...
s. In the hash table implementation of associative arrays, an array
of length
is partially filled with
elements, where
. A value
gets stored at an index location