
In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a hash table is a
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that implements an
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
, also called a dictionary or simply map; an associative array is an
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
that maps
keys to
values
In ethics and social sciences, value denotes the degree of importance of some thing or action, with the aim of determining which actions are best to do or what way is best to live ( normative ethics), or to describe the significance of different a ...
.
A hash table uses a
hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
to compute an ''index'', also called a ''hash code'', into an array of ''buckets'' or ''slots'', from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.
Most hash table designs employ an
imperfect hash function.
Hash collisions, where the hash function generates the same index for more than one key, therefore typically must be accommodated in some way.
In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of
key–value pairs, at
amortized constant average cost per operation.
Hashing is an example of a
space-time tradeoff. If
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
or
linear search can be used to retrieve the element.
In many situations, hash tables turn out to be on average more efficient than
search tree
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
s or any other
table
Table may refer to:
* Table (database), how the table data arrangement is used within the databases
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (information), a data arrangement with rows and column ...
lookup structure. For this reason, they are widely used in many kinds of computer
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
, particularly for
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
s,
database index
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data withou ...
ing,
caches, and
sets.
History
The idea of hashing arose independently in different places. In January 1953,
Hans Peter Luhn
Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German-American researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and s ...
wrote an internal
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
memorandum that used hashing with chaining. The first example of
open addressing
Open addressing, or closed hashing, is a method of Hash table#Collision resolution, collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternative locations in the array (the ''prob ...
was proposed by A. D. Linh, building on Luhn's memorandum.
Around the same time,
Gene Amdahl,
Elaine M. McGraw,
Nathaniel Rochester, and
Arthur Samuel of
IBM Research
IBM Research is the research and development division for IBM, an American Multinational corporation, multinational information technology company. IBM Research is headquartered at the Thomas J. Watson Research Center in Yorktown Heights, New York ...
implemented hashing for the
IBM 701
The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May 2 ...
assembler. Open addressing with linear probing is credited to Amdahl, although
Andrey Ershov independently had the same idea.
The term "open addressing" was coined by
W. Wesley Peterson in his article which discusses the problem of search in large files.
The first
published
Publishing is the activities of making information, literature, music, software, and other content, physical or digital, available to the public for sale or free of charge. Traditionally, the term publishing refers to the creation and distribu ...
work on hashing with chaining is credited to
Arnold Dumey, who discussed the idea of using remainder modulo a prime as a hash function. The word "hashing" was first published in an article by Robert Morris. A
theoretical analysis of linear probing was submitted originally by Konheim and Weiss.
Overview
An
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
stores a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of
unique keys. In the hash table implementation of associative arrays, an array
of length
is partially filled with
elements, where
. A key
is hashed using a hash function
to compute an index location