A cryptographic hash function (CHF) is a
hash algorithm (a
map of an arbitrary binary string to a binary string with fixed size of
bits) that has special properties desirable for
cryptography
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adve ...
:
* the probability of a particular
-bit output result (
hash value
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
) for a random input string ("message") is
(like for any good hash), so the hash value can be used as a representative of the message;
* finding an input string that matches a given hash value (a ''pre-image'') is unfeasible, unless the value is selected from a known pre-calculated dictionary ("
rainbow table
A rainbow table is an efficient way to store data that has been computed in advance to facilitate cracking passwords. To protect stored passwords from compromise in case of a data breach, organizations avoid storing them directly, instead transfo ...
"). The ''resistance'' to such search is quantified as
security strength, a cryptographic hash with
bits of hash value is expected to have a ''preimage resistance'' strength of
bits. A ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known;
* finding any pair of different messages that yield the same hash value (a ''collision'') is also unfeasible, a cryptographic hash is expected to have a ''collision resistance'' strength of
bits (lower due to the
birthday paradox
In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...
).
Cryptographic hash functions have many
information-security applications, notably in
digital signature
A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature, where the prerequisites are satisfied, gives a recipient very high confidence that the message was created b ...
s,
message authentication code
In cryptography, a message authentication code (MAC), sometimes known as a ''tag'', is a short piece of information used for authenticating a message. In other words, to confirm that the message came from the stated sender (its authenticity) and ...
s (MACs), and other forms of
authentication
Authentication (from ''authentikos'', "real, genuine", from αὐθέντης ''authentes'', "author") is the act of proving an assertion, such as the identity of a computer system user. In contrast with identification, the act of indicat ...
. They can also be used as ordinary
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
s, to index data in
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
s, for
fingerprinting
A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...
, to detect duplicate data or uniquely identify files, and as
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
s to detect accidental data corruption. Indeed, in information-security contexts, cryptographic hash values are sometimes called (''digital'') ''fingerprints'', ''checksums'', or just ''hash values'', even though all these terms stand for more general functions with rather different properties and purposes.
Properties
Most cryptographic hash functions are designed to take a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
of any length as input and produce a fixed-length hash value.
A cryptographic hash function must be able to withstand all known
types of cryptanalytic attack. In theoretical cryptography, the security level of a cryptographic hash function has been defined using the following properties:
; Pre-image resistance: Given a hash value , it should be difficult to find any message such that . This concept is related to that of a
one-way function
In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, sp ...
. Functions that lack this property are vulnerable to
preimage attack
In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs).
In the context of attack, the ...
s.
; Second pre-image resistance: Given an input , it should be difficult to find a different input such that . This property is sometimes referred to as ''weak collision resistance''. Functions that lack this property are vulnerable to
second-preimage attacks.
;
Collision resistance: It should be difficult to find two different messages and such that . Such a pair is called a cryptographic
hash collision
In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits.
Al ...
. This property is sometimes referred to as ''strong collision resistance''. It requires a hash value at least twice as long as that required for pre-image resistance; otherwise collisions may be found by a
birthday attack.
Collision resistance implies second pre-image resistance but does not imply pre-image resistance. The weaker assumption is always preferred in theoretical cryptography, but in practice, a hash-function which is only second pre-image resistant is considered insecure and is therefore not recommended for real applications.
Informally, these properties mean that a
malicious adversary cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical. Second pre-image resistance prevents an attacker from crafting a document with the same hash as a document the attacker cannot control. Collision resistance prevents an attacker from creating two distinct documents with the same hash.
A function meeting these criteria may still have undesirable properties. Currently, popular cryptographic hash functions are vulnerable to
''length-extension'' attacks: given and but not , by choosing a suitable an attacker can calculate , where ∥ denotes
concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
.
This property can be used to break naive authentication schemes based on hash functions. The
HMAC
In cryptography, an HMAC (sometimes expanded as either keyed-hash message authentication code or hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secre ...
construction works around these problems.
In practice, collision resistance is insufficient for many practical uses. In addition to collision resistance, it should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. In particular, a hash function should behave as much as possible like a
random function (often called a
random oracle
In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every ''unique query'' with a (truly) random response chosen uniformly from its output domain. If a query is repeated, it responds the same way every time tha ...
in proofs of security) while still being deterministic and efficiently computable. This rules out functions like the
SWIFFT function, which can be rigorously proven to be collision-resistant assuming that certain problems on ideal lattices are computationally difficult, but, as a linear function, does not satisfy these additional properties.
Checksum algorithms, such as
CRC32 and other
cyclic redundancy checks, are designed to meet much weaker requirements and are generally unsuitable as cryptographic hash functions. For example, a CRC was used for message integrity in the
WEP encryption standard, but an attack was readily discovered, which exploited the linearity of the checksum.
Degree of difficulty
In cryptographic practice, "difficult" generally means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important". The meaning of the term is therefore somewhat dependent on the application since the effort that a malicious agent may put into the task is usually proportional to their expected gain. However, since the needed effort usually multiplies with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a dozen bits to the latter.
For messages selected from a limited set of messages, for example
password
A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
s or other short messages, it can be feasible to invert a hash by trying all possible messages in the set. Because cryptographic hash functions are typically designed to be computed quickly, special
key derivation function
In cryptography, a key derivation function (KDF) is a cryptographic algorithm that derives one or more secret keys from a secret value such as a master key, a password, or a passphrase using a pseudorandom function (which typically uses a cryp ...
s that require greater computing resources have been developed that make such
brute-force attack
In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct ...
s more difficult.
In some
theoretical analyses "difficult" has a specific mathematical meaning, such as "not solvable in
asymptotic
In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...
polynomial time
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
". Such interpretations of ''difficulty'' are important in the study of
provably secure cryptographic hash functions but do not usually have a strong connection to practical security. For example, an
exponential-time algorithm can sometimes still be fast enough to make a feasible attack. Conversely, a polynomial-time algorithm (e.g., one that requires steps for -digit keys) may be too slow for any practical use.
Illustration
An illustration of the potential use of a cryptographic hash is as follows:
Alice
Alice may refer to:
* Alice (name), most often a feminine given name, but also used as a surname
Literature
* Alice (''Alice's Adventures in Wonderland''), a character in books by Lewis Carroll
* ''Alice'' series, children's and teen books by ...
poses a tough math problem to
Bob and claims that she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, computes its hash, and tells Bob the hash value (whilst keeping the solution secret). Then, when Bob comes up with the solution himself a few days later, Alice can prove that she had the solution earlier by revealing it and having Bob hash it and check that it matches the hash value given to him before. (This is an example of a simple