
In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, a hash table, also known as hash map, is a
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...
that implements an
associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
or dictionary. It is an
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a '' user'', of the data, specifically in terms of possible values, po ...
that maps
keys
Key or The Key may refer to:
Common meanings
* Key (cryptography), a piece of information that controls the operation of a cryptography algorithm
* Key (lock), device used to control access to places or facilities restricted by a lock
* Key (ma ...
to
values
In ethics and social sciences, value denotes the degree of importance of something or action, with the aim of determining which actions are best to do or what way is best to live (normative ethics in ethics), or to describe the significance of dif ...
.
A hash table uses a
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash ''
collisions
In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great fo ...
'' where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.
In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of
key–value pairs, at
amortized
In computer science, amortized analysis is a method for Analysis of algorithms, analyzing a given algorithm's Computational complexity, complexity, or how much of a resource, especially time or memory, it takes to Execution (computing), execute. ...
constant average cost per operation.
[ Charles E. Leiserson]
''Amortized Algorithms, Table Doubling, Potential Method''
Lecture 13, course MIT 6.046J/18.410J Introduction to Algorithms—Fall 2005
Hashing is an example of a
space-time tradeoff. If
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
or
linear search
In computer science, a linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
A linear search runs in at ...
can be used to retrieve the element.
In many situations, hash tables turn out to be on average more efficient than
search tree
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
s or any other
table
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
lookup structure. For this reason, they are widely used in many kinds of computer
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
, particularly for
associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
s,
database index
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without ...
ing,
caches, and
sets.
History
The idea of hashing arose independently in different places. In January 1953,
Hans Peter Luhn
Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and Selective ...
wrote an internal
IBM memorandum that used hashing with chaining. Open addressing was later proposed by A. D. Linh building on Luhn's paper.
Around the same time,
Gene Amdahl
Gene Myron Amdahl (November 16, 1922 – November 10, 2015) was an American computer architect and high-tech entrepreneur, chiefly known for his work on mainframe computers at IBM and later his own companies, especially Amdahl Corporation ...
,
Elaine M. McGraw
Elaine M. McGraw (née Boehme) was an American computer programmer who, together with Arthur Samuel and Gene Amdahl, invented open addressing based hash tables in 1954.
After studying economics, McGraw began working as a computer programmer for ...
,
Nathaniel Rochester
Nathaniel Rochester (February 21, 1752 – May 17, 1831) was an American Revolutionary War soldier, and land speculator, most noted for founding the settlement which would become Rochester, New York.
Early life
Nathaniel Rochester was born ...
, and
Arthur Samuel
Arthur Lee Samuel (December 5, 1901 – July 29, 1990) was an American pioneer in the field of computer gaming and artificial intelligence. He popularized the term "machine learning" in 1959. The Samuel Checkers-playing Program was among the wo ...
of
IBM Research
IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research or ...
implemented hashing for the
IBM 701
The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May ...
assembler
Assembler may refer to:
Arts and media
* Nobukazu Takemura, avant-garde electronic musician, stage name Assembler
* Assemblers, a fictional race in the ''Star Wars'' universe
* Assemblers, an alternative name of the superhero group Champions of A ...
. Open addressing with linear probing is credited to Amdahl, although
Ershov independently had the same idea.
The term "open addressing" was coined by
W. Wesley Peterson on his article which discusses the problem of search in large files.
The first
published
Publishing is the activity of making information, literature, music, software and other content available to the public for sale or for free. Traditionally, the term refers to the creation and distribution of printed works, such as books, new ...
work on hashing with chaining is credited to
Arnold Dumey, who discussed the idea of using remainder module a prime as a hash function. The word "hashing" was first published by an article by Robert Morris. A
theoretical analysis of linear probing was submitted originally by Konheim and Weiss.
Overview
An
associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
stores a
set of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of
unique key In relational database management systems, a unique key is a candidate key that is not the primary key of the relation. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the pr ...
s. In the hash table implementation of associative arrays, an array
of length
is partially filled with
elements, where
. A value
gets stored at an index location