A birthday attack is a bruteforce
collision attack that exploits the mathematics behind the
birthday problem
In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share the same birthday. The birthday paradox is the counterintuitive fact that only 23 people are needed for that ...
in
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (
pigeonholes). Let
be the number of possible values of a hash function, with
. With a birthday attack, it is possible to find a
collision of a hash function with
chance in
where
is the bit length of the hash output,
and with
being the classical
preimage resistance security with the same probability.
There is a general (though disputed)
result
A result (also called upshot) is the outcome or consequence of a sequence of actions or events. Possible results include gain, injury, value, and victory. Some types of results include the outcome of an action, the final value of a calculation ...
that quantum computers can perform birthday attacks, thus breaking collision resistance, in
.
Although there are some
digital signature vulnerabilities associated with the birthday attack, it cannot be used to break an encryption scheme any faster than a
brute-force attack.
Understanding the problem
As an example, consider the scenario in which a teacher with a class of 30 students (n = 30) asks for everybody's birthday (for simplicity, ignore
leap year
A leap year (also known as an intercalary year or bissextile year) is a calendar year that contains an additional day (or, in the case of a lunisolar calendar, a month) compared to a common year. The 366th day (or 13th month) is added to keep t ...
s) to determine whether any two students have the same birthday (corresponding to a hash collision as described further). Intuitively, this chance may seem small. Counter-intuitively, the probability that at least one student has the same birthday as ''any'' other student on any day is around 70% (for n = 30), from the formula
.
If the teacher had picked a ''specific'' day (say, 16 September), then the chance that at least one student was born on that specific day is
, about 7.9%.
In a birthday attack, the attacker prepares many different variants of benign and malicious contracts, each having a
digital signature. A pair of benign and malicious contracts with the same signature is sought. In this fictional example, suppose that the digital signature of a string is the first byte of its
SHA-256
SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
hash. The pair found is indicated in green – note that finding a pair of benign contracts (blue) or a pair of malicious contracts (red) is useless. After the victim accepts the benign contract, the attacker substitutes it with the malicious one and claims the victim signed it, as proven by the digital signature.
Relation to the balls into bins problem
The birthday attack can be modelled as a variation of the
balls into bins problem, where balls (hash function inputs) are randomly placed into bins (hash function outputs). A hash collision occurs when at least two balls are placed into the same bin.
Mathematics
Given a function
, the goal of the attack is to find two different inputs
such that
. Such a pair
is called a collision. The method used to find a collision is simply to evaluate the function
for different input values that may be chosen randomly or pseudorandomly until the same result is found more than once. Because of the birthday problem, this method can be rather efficient. Specifically, if a
function yields any of
different outputs with equal probability and
is sufficiently large, then we expect to obtain a pair of different arguments
and
with
after evaluating the function for about
different arguments on average.
We consider the following experiment. From a set of ''H'' values we choose ''n'' values uniformly at random thereby allowing repetitions. Let ''p''(''n''; ''H'') be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as
:
where
is the number of chosen values (inputs) and
is the number of possible outcomes (possible hash outputs).
Let ''n''(''p''; ''H'') be the smallest number of values we have to choose, such that the probability for finding a collision is at least ''p''. By inverting this expression above, we find the following approximation
:
and assigning a 0.5 probability of collision we arrive at
:
Let ''Q''(''H'') be the expected number of values we have to choose before finding the first collision. This number can be approximated by
:
As an example, if a 64-bit hash is used, there are approximately different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts () to generate a collision using brute force. This value is called birthday bound and it could be approximated as 2
''l''/2, where ''l'' is the number of bits in H. Other examples are as follows:
:
:''Table shows number of hashes n''(''p'')'' needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, ' to ' is the uncorrectable bit error rate of a typical hard disk. In theory,
MD5
The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321.
MD5 ...
hashes or
UUIDs, being roughly 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.''
It is easy to see that if the outputs of the function are distributed unevenly, then a collision could be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks (exploiting uneven key distribution.) However, determining the balance of a hash function will typically require all possible inputs to be calculated and thus is infeasible for popular hash functions such as the MD and SHA families.
The subexpression
in the equation for
is not computed accurately for small
when directly translated into common programming languages as
log(1/(1-p))
due to
loss of significance. When
log1p
is available (as it is in
C99) for example, the equivalent expression
-log1p(-p)
should be used instead.
If this is not done, the first column of the above table is computed as zero, and several items in the second column do not have even one correct significant digit.
Simple approximation
A good
rule of thumb
In English language, English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associat ...
which can be used for
mental calculation
Mental calculation (also known as mental computation) consists of arithmetical calculations made by the mind, within the brain, with no help from any supplies (such as pencil and paper) or devices such as a calculator. People may use menta ...
is the relation
:
which can also be written as
:
.
or
:
.
This works well for probabilities less than or equal to 0.5.
This approximation scheme is especially easy to use when working with exponents. For instance, suppose you are building 32-bit hashes (
) and want the chance of a collision to be at most one in a million (
), how many documents could we have at the most?
:
which is close to the correct answer of 93.
Digital signature susceptibility
Digital signatures can be susceptible to a birthday attack or more precisely a chosen-prefix collision attack. A message
is typically signed by first computing
, where
is a
cryptographic hash function
A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...
, and then using some secret key to sign
. Suppose
Mallory wants to trick
Bob into signing a
fraudulent contract. Mallory prepares a fair contract
and a fraudulent one
. She then finds a number of positions where
can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms, etc. By combining these changes, she can create a huge number of variations on
which are all fair contracts.
In a similar manner, Mallory also creates a huge number of variations on the fraudulent contract
. She then applies the hash function to all these variations until she finds a version of the fair contract and a version of the fraudulent contract which have the same hash value,
. She presents the fair version to Bob for signing. After Bob has signed, Mallory takes the signature and attaches it to the fraudulent contract. This signature then "proves" that Bob signed the fraudulent contract.
The probabilities differ slightly from the original birthday problem, as Mallory gains nothing by finding two fair or two fraudulent contracts with the same hash. Mallory's strategy is to generate pairs of one fair and one fraudulent contract. For a given hash function
is the number of possible hashes, where
is the bit length of the hash output.
The birthday problem equations do not exactly apply here. For a 50% chance of a collision, Mallory would need to generate approximately
hashes, which is twice the number required for a simple collision under the classical birthday problem.
To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as many bits as are needed to prevent an ordinary
brute-force attack.
Besides using a larger bit length, the signer (Bob) can protect himself by making some random, inoffensive changes to the document before signing it, and by keeping a copy of the contract he signed in his own possession, so that he can at least demonstrate in court that his signature matches that contract, not just the fraudulent one.
Pollard's rho algorithm for logarithms is an example for an algorithm using a birthday attack for the computation of
discrete logarithm
In mathematics, for given real numbers a and b, the logarithm \log_b(a) is a number x such that b^x=a. Analogously, in any group G, powers b^k can be defined for all integers k, and the discrete logarithm \log_b(a) is an integer k such that b^k=a ...
s.
Reverse attack
The same fraud is possible if the signer is Mallory, not Bob. Bob could suggest a contract to Mallory for a signature. Mallory could find both an inoffensively-modified version of this fair contract that has the same signature as a fraudulent contract, and Mallory could provide the modified fair contract and signature to Bob. Later, Mallory could produce the fraudulent copy. If Bob doesn't have the inoffensively-modified version contract (perhaps only finding their original proposal), Mallory's fraud is perfect. If Bob does have it, Mallory can at least claim that it is Bob who is the fraudster.
See also
*
Collision attack
*
Meet-in-the-middle attack
*
BHT Algorithm
Notes
References
*
Mihir Bellare, Tadayoshi Kohno: Hash Function Balance and Its Impact on Birthday Attacks.
EUROCRYPT
EuroCrypt is a conditional access system for Multiplexed Analogue Components-encoded analogue satellite television
Satellite television is a service that delivers television programming to viewers by relaying it from a communications satell ...
2004: pp401–418
* ''Applied Cryptography, 2nd ed.'' by
Bruce Schneier
Bruce Schneier (; born January 15, 1963) is an American cryptographer, computer security professional, privacy specialist, and writer. Schneier is an Adjunct Lecturer in Public Policy at the Harvard Kennedy School and a Fellow at the Berkman ...
External links
"What is a digital signature and what is authentication?"from
RSA Security's crypto
FAQ.
"Birthday Attack"X5 Networks Crypto FAQs
{{cryptography navbox , hash
Cryptographic attacks
de:Kollisionsangriff#Geburtstagsangriff