Birthday problem
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, the birthday problem asks for the probability that, in a set of
random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual ran ...
ly chosen people, at least two will share a
birthday A birthday is the anniversary of the birth of a person, or figuratively of an institution. Birthdays of people are celebrated in numerous cultures, often with birthday gifts, birthday cards, a birthday party, or a rite of passage. Many re ...
. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people. The birthday paradox is a
veridical paradox A paradox is a logically self-contradictory statement or a statement that runs contrary to one's expectation. It is a statement that, despite apparently valid reasoning from true premises, leads to a seemingly self-contradictory or a logically u ...
: it appears wrong, but is in fact true. While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the comparisons of birthdays will be made between every possible pair of individuals. With 23 individuals, there are (23 × 22) / 2 = 253 pairs to consider, much more than half the number of days in a year. Real-world applications for the birthday problem include a cryptographic attack called the
birthday attack A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likel ...
, which uses this probabilistic model to reduce the complexity of finding a
collision In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great fo ...
for a
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population. The problem is generally attributed to
Harold Davenport Harold Davenport FRS (30 October 1907 – 9 June 1969) was an English mathematician, known for his extensive work in number theory. Early life Born on 30 October 1907 in Huncoat, Lancashire, Davenport was educated at Accrington Grammar Scho ...
in about 1927, though he did not publish it at the time. Davenport did not claim to be its discoverer "because he could not believe that it had not been stated earlier". The first publication of a version of the birthday problem was by
Richard von Mises Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
in 1939.


Calculating the probability

From a
permutations In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or p ...
perspective, let the event be the probability of finding a group of 23 people without any repeated birthdays. Where the event is the probability of finding a group of 23 people with at least two people sharing same birthday, . is the ratio of the total number of birthdays, V_, without repetitions and order matters (e.g. for a group of 2 people, mm/dd birthday format, one possible outcome is \left \ divided by the total number of birthdays with repetition and order matters, V_, as it is the total space of outcomes from the experiment (e.g. 2 people, one possible outcome is \left \. Therefore V_ and V_ are
permutations In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or p ...
. :\begin V_ &= \frac = \frac \\ V_ &= n^ = 365^ \\ P(A) &= \frac \approx 0.492703 \\ P(B) &= 1 - P(A) \approx 1 - 0.492703 \approx 0.507297 (50.7297%)\end Another way the birthday problem can be solved is by asking for an approximate probability that in a group of people at least two have the same birthday. For simplicity,
leap year A leap year (also known as an intercalary year or bissextile year) is a calendar year that contains an additional day (or, in the case of a lunisolar calendar, a month) added to keep the calendar year synchronized with the astronomical year or ...
s,
twin Twins are two offspring produced by the same pregnancy.MedicineNet > Definition of TwinLast Editorial Review: 19 June 2000 Twins can be either ''monozygotic'' ('identical'), meaning that they develop from one zygote, which splits and forms two em ...
s,
selection bias Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population int ...
, and seasonal and weekly variations in birth rates are generally disregarded, and instead it is assumed that there are 365 possible birthdays, and that each person's birthday is equally likely to be any of these days, independent of the other people in the group. For independent birthdays, the uniform distribution on birthdays is the distribution that minimizes the probability of two people with the same birthday; any unevenness increases this probability. The problem of a non-uniform number of births occurring during each day of the year was first addressed by Murray Klamkin in 1967. As it happens, the real-world distribution yields a critical size of 23 to reach 50%. The goal is to compute , the probability that at least two people in the room have the same birthday. However, it is simpler to calculate , the probability that no two people in the room have the same birthday. Then, because and are the only two possibilities and are also
mutually exclusive In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...
, Here is the calculation of for 23 people. Let the 23 people be numbered 1 to 23. The
event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of ev ...
that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events be called Event 2, Event 3, and so on. Event 1 is the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
: the probability of Event 2 is 364/365, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is 363/365, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is 343/365. Finally, the principle of conditional probability implies that is equal to the product of these individual probabilities: The terms of equation () can be collected to arrive at: Evaluating equation () gives Therefore,  (50.7297%). This process can be generalized to a group of people, where is the probability of at least two of the people sharing a birthday. It is easier to first calculate the probability that all birthdays are ''different''. According to the
pigeonhole principle In mathematics, the pigeonhole principle states that if items are put into containers, with , then at least one container must contain more than one item. For example, if one has three gloves (and none is ambidextrous/reversible), then there mu ...
, is zero when . When : : \begin \bar p(n) &= 1 \times \left(1-\frac\right) \times \left(1-\frac\right) \times \cdots \times \left(1-\frac\right) \\ pt&= \frac \\ pt&= \frac = \frac = \frac\end where is the
factorial In mathematics, the factorial of a non-negative denoted is the product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial: \begin n! &= n \times (n-1) \times (n-2) \ ...
operator, is the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
and denotes
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or pro ...
. The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (), the third cannot have the same birthday as either of the first two (), and in general the th birthday cannot be the same as any of the preceding birthdays. The
event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of ev ...
of at least two of the persons having the same birthday is complementary to all birthdays being different. Therefore, its probability is : p(n) = 1 - \bar p(n). The following table shows the probability for some other values of (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely): :


Approximations

The
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
expansion of the
exponential function The exponential function is a mathematical function denoted by f(x)=\exp(x) or e^x (where the argument is written as an exponent). Unless otherwise specified, the term generally refers to the positive-valued function of a real variable, ...
(the constant ) : e^x = 1 + x + \frac+\cdots provides a first-order approximation for for , x, \ll 1: : e^x \approx 1 + x. To apply this approximation to the first expression derived for , set . Thus, : e^ \approx 1 - \frac. Then, replace with non-negative integers for each term in the formula of until , for example, when , : e^ \approx 1 - \frac. The first expression derived for can be approximated as : \begin \bar p(n) & \approx 1 \cdot e^ \cdot e^ \cdots e^ \\ pt& = e^ \\ pt& = e^ = e^. \end Therefore, : p(n) = 1-\bar p(n) \approx 1 - e^. An even coarser approximation is given by :p(n)\approx 1-e^, which, as the graph illustrates, is still fairly accurate. According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are , if there are persons, and if , then using the same approach as above we achieve the result that if is the probability that at least two out of people share the same birthday from a set of available days, then: :\begin p(n, d) & \approx 1-e^ \\ pt& \approx 1-e^. \end


A simple exponentiation

The probability of any two people not having the same birthday is . In a room containing ''n'' people, there are pairs of people, i.e. events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. In short can be multiplied by itself times, which gives us :\bar p(n) \approx \left(\frac\right)^\binom. Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is :p(n) \approx 1 - \left(\frac\right)^\binom.


Poisson approximation

Applying the Poisson approximation for the binomial on the group of 23 people, :\operatorname\left(\frac\right) =\operatorname\left(\frac\right) \approx \operatorname(0.6932) so :\Pr(X>0)=1-\Pr(X=0) \approx 1-e^ \approx 1-0.499998=0.500002. The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses e^x \approx 1 + x.


Square approximation

A good
rule of thumb In English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associated with various t ...
which can be used for mental calculation is the relation :p(n) \approx \frac which can also be written as :n \approx \sqrt which works well for probabilities less than or equal to . In these equations, is the number of days in a year. For instance, to estimate the number of people required for a chance of a shared birthday, we get :n \approx \sqrt = \sqrt \approx 19 Which is not too far from the correct answer of 23.


Approximation of number of people

This can also be approximated using the following formula for the ''number'' of people necessary to have at least a chance of matching: :n \approx \tfrac + \sqrt = 22.999943. This is a result of the good approximation that an event with probability will have a chance of occurring at least once if it is repeated times.


Probability table

: The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error). For comparison, to is the uncorrectable bit error rate of a typical hard disk. In theory, 128-bit hash functions, such as MD5, should stay within that range until about documents, even if its possible outputs are many more.


An upper bound on the probability and a lower bound on the number of people

The argument below is adapted from an argument of
Paul Halmos Paul Richard Halmos ( hu, Halmos Pál; March 3, 1916 – October 2, 2006) was a Hungarian-born American mathematician and statistician who made fundamental advances in the areas of mathematical logic, probability theory, statistics, operator ...
. As stated above, the probability that no two birthdays coincide is :1-p(n) = \bar p(n) = \prod_^\left(1-\frac\right) . As in earlier paragraphs, interest lies in the smallest such that ; or equivalently, the smallest such that . Using the inequality in the above expression we replace with . This yields :\bar p(n) = \prod_^\left(1-\frac\right) < \prod_^\left(e^\right) = e^ . Therefore, the expression above is not only an approximation, but also an
upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an eleme ...
of . The inequality : e^ < \frac implies . Solving for gives :n^2-n > 730 \ln 2 . Now, is approximately 505.997, which is barely below 506, the value of attained when . Therefore, 23 people suffice. Incidentally, solving for ''n'' gives the approximate formula of Frank H. Mathis cited above. This derivation only shows that ''at most'' 23 people are needed to ensure a birthday match with even chance; it leaves open the possibility that is 22 or less could also work.


Generalizations


Arbitrary number of days

Given a year with days, the generalized birthday problem asks for the minimal number such that, in a set of randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, is the minimal integer such that :1-\left(1-\frac\right)\left(1-\frac\right)\cdots\left(1-\frac\right)\geq \frac. The classical birthday problem thus corresponds to determining . The first 99 values of are given here : : A similar calculation shows that = 23 when is in the range 341–372. A number of bounds and formulas for have been published. For any , the number satisfies :\frac These bounds are optimal in the sense that the sequence gets arbitrarily close to :\frac \approx 0.27, while it has :9-\sqrt\approx 1.28 as its maximum, taken for . The bounds are sufficiently tight to give the exact value of in most of the cases. For example, for 365 these bounds imply that and 23 is the only integer in that range. In general, it follows from these bounds that always equals either :\left\lceil\sqrt\,\right\rceil \quad\text\quad \left\lceil\sqrt\,\right\rceil+1 where denotes the
ceiling function In mathematics and computer science, the floor function is the function that takes as input a real number , and gives as output the greatest integer less than or equal to , denoted or . Similarly, the ceiling function maps to the least int ...
. The formula :n(d) = \left\lceil\sqrt\,\right\rceil holds for 73% of all integers . The formula :n(d) = \left\lceil\sqrt+\frac\right\rceil holds for
almost all In mathematics, the term "almost all" means "all but a negligible amount". More precisely, if X is a set, "almost all elements of X" means "all elements of X but those in a negligible subset of X". The meaning of "negligible" depends on the mathema ...
, i.e., for a set of integers with asymptotic density 1. The formula :n(d)=\left\lceil \sqrt+\frac+\frac\right\rceil holds for all , but it is conjectured that there are infinitely many counterexamples to this formula. The formula :n(d)=\left\lceil \sqrt+\frac+\frac-\frac\right\rceil holds for all , and it is conjectured that this formula holds for all .


More than two people sharing a birthday

It is possible to extend the problem to ask how many people in a group are necessary for there to be a greater than 50% probability that at least 3/4/5/etc. of the group share the same birthday. The first few values are as follows: >50% probability of 3 people sharing a birthday - 88 people; >50% probability of 4 people sharing a birthday - 187 people .


Probability of a shared birthday (collision)

The birthday problem can be generalized as follows: :Given random integers drawn from a
discrete uniform distribution In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of ''n'' values has equal probability 1/''n''. Anoth ...
with range , what is the probability that at least two numbers are the same? ( gives the usual birthday problem.) The generic results can be derived using the same arguments given above. :\begin p(n;d) &= \begin 1-\displaystyle\prod_^\left(1-\frac\right) & n\le d \\ 1 & n > d \end \\ px& \approx 1 - e^ \\ & \approx 1 - \left( \frac \right)^\frac \end Conversely, if denotes the number of random integers drawn from to obtain a probability that at least two numbers are the same, then :n(p;d)\approx \sqrt. The birthday problem in this more generic sense applies to
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
s: the expected number of - bit hashes that can be generated before getting a collision is not , but rather only . This is exploited by
birthday attack A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likel ...
s on
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output ...
s and is the reason why a small number of collisions in a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
are, for all practical purposes, inevitable. The theory behind the birthday problem was used by Zoe Schnabel under the name of
capture-recapture Mark and recapture is a method commonly used in ecology to estimate an animal population's size where it is impractical to count every individual. A portion of the population is captured, marked, and released. Later, another portion will be captur ...
statistics to estimate the size of fish population in lakes.


Generalization to multiple types of people

The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types. In the simplest extension there are two types of people, say men and women, and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between two men or two women do not count.) The probability of no shared birthdays here is :p_0 =\frac \sum_^m \sum_^n S_2(m,i) S_2(n,j) \prod_^ d - k where and are
Stirling numbers of the second kind In mathematics, particularly in combinatorics, a Stirling number of the second kind (or Stirling partition number) is the number of ways to partition a set of ''n'' objects into ''k'' non-empty subsets and is denoted by S(n,k) or \textstyle \lef ...
. Consequently, the desired probability is . This variation of the birthday problem is interesting because there is not a unique solution for the total number of people . For example, the usual 50% probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men.


Other birthday problems


First match

A related question is, as people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room? That is, for what is maximum? The answer is 20—if there is a prize for first match, the best position in line is 20th.


Same birthday as you

In the birthday problem, neither of the two people is chosen in advance. By contrast, the probability that someone in a room of other people has the same birthday as a ''particular'' person (for example, you) is given by : q(n) = 1 - \left( \frac \right)^n and for general by : q(n;d) = 1 - \left( \frac \right)^n. In the standard case of , substituting gives about 6.1%, which is less than 1 chance in 16. For a greater than 50% chance that one person in a roomful of people has the same birthday as ''you'', would need to be at least 253. This number is significantly higher than : the reason is that it is likely that there are some birthday matches among the other people in the room.


Number of people with a shared birthday

For any one person in a group of ''n'' people the probability that he or she shares his birthday with someone else is q(n-1;d) , as explained above. The expected number of people with a shared (non-unique) birthday can now be calculated easily by multiplying that probability by the number of people (''n''), so it is: : n\left(1 - \left( \frac \right)^\right) (This multiplication can be done this way because of the linearity of the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of indicator variables). This implies that the expected number of people with a non-shared (unique) birthday is: : n \left( \frac \right)^ Similar formulas can be derived for the expected number of people who share with three, four, etc. other people.


Number of people until every birthday is achieved

The expected number of people needed until every birthday is achieved is called the Coupon collector's problem. It can be calculated by n*H_n, where H_n is the n-th harmonic number. For 365 possible dates (the birthday problem), the answer is 2365.


Near matches

Another generalization is to ask for the probability of finding at least one pair in a group of people with birthdays within calendar days of each other, if there are equally likely birthdays.M. Abramson and W. O. J. Moser (1970) ''More Birthday Surprises'',
American Mathematical Monthly ''The American Mathematical Monthly'' is a mathematical journal founded by Benjamin Finkel in 1894. It is published ten times each year by Taylor & Francis for the Mathematical Association of America. The ''American Mathematical Monthly'' is an ...
77, 856–858
: \begin p(n,k,d) &= 1 - \frac\end The number of people required so that the probability that some pair will have a birthday separated by days or fewer will be higher than 50% is given in the following table: : Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.


Number of days with a certain number of birthdays


Number of days with at least one birthday

The expected number of different birthdays, i.e. the number of days that are at least one person's birthday, is: :d - d \left (\frac \right )^n This follows from the expected number of days that are no one's birthday: :d \left (\frac \right )^n which follows from the probability that a particular day is no one's birthday, ((d-1)/d)^n, easily summed because of the linearity of the expected value. For instance, with ''d = 365'', you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays (24 unclaimed birthdays).


Number of days with at least two birthdays

The above can be generalized from the distribution of the number of people with their birthday on any particular day, which is a
Binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
with probability 1/''d''. Multiplying the relevant probability by ''d'' will then give the expected number of days. For example, the expected number of days which are shared; i.e. which are at least two (i.e. not zero and not one) people's birthday is: d - d \left (\frac \right )^n - d \cdot \binom \left (\frac \right )^1\left (\frac \right )^ = d - d \left (\frac \right )^n - n \left (\frac \right )^


Number of people who repeat a birthday

The probability that the th integer randomly chosen from will repeat at least one previous choice equals above. The expected total number of times a selection will repeat a previous selection as such integers are chosen equals :\sum_^n q(k-1;d) = n - d + d \left (\frac \right )^n This can be seen to equal the number of people minus the expected number of different birthdays.


Average number of people to get at least one shared birthday

In an alternative formulation of the birthday problem, one asks the ''average'' number of people required to find a pair with the same birthday. If we consider the probability function Pr people have at least one shared birthday this ''average'' is determining the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ar ...
of the distribution, as opposed to the customary formulation, which asks for the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
. The problem is relevant to several hashing algorithms analyzed by
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
in his book ''
The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of com ...
''. It may be shown that if one samples uniformly, with replacement, from a population of size , the number of trials required for the first repeated sampling of ''some'' individual has
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
, where : Q(M)=\sum_^ \frac. The function : Q(M)= 1 + \frac + \frac + \cdots + \frac has been studied by
Srinivasa Ramanujan Srinivasa Ramanujan (; born Srinivasa Ramanujan Aiyangar, ; 22 December 188726 April 1920) was an Indian mathematician. Though he had almost no formal training in pure mathematics, he made substantial contributions to mathematical analysis, ...
and has
asymptotic expansion In mathematics, an asymptotic expansion, asymptotic series or Poincaré expansion (after Henri Poincaré) is a formal series of functions which has the property that truncating the series after a finite number of terms provides an approximation to ...
: : Q(M)\sim\sqrt-\frac+\frac\sqrt-\frac+\cdots. With days in a year, the average number of people required to find a pair with the same birthday is , somewhat more than 23, the number required for a 50% chance. In the best case, two people will suffice; at worst, the maximum possible number of people is needed; but on average, only 25 people are required An analysis using indicator random variables can provide a simpler but approximate analysis of this problem. For each pair (''i'', ''j'') for k people in a room, we define the indicator random variable ''Xij'', for 1\leq i \leq j\leq k, by \begin X_ & = I \ \\ & = \begin 1, & \texti\textj\text \\ 0, & \text \end \end \begin E _& = \Pr \ \\ & = 1/n \end Let X be a random variable counting the pairs of individuals with the same birthday. X =\sum_^k \sum_^k X_ \begin E & = \sum_^k \sum_^k E _\ & = \binom \frac\\ & = \frac\\ \end For , if , the expected number of people with the same birthday is (28 \cdot 27) / (2 \cdot 365) \approx 1.0356. Therefore, we can expect at least one matching pair with at least 28 people. An informal demonstration of the problem can be made from the list of prime ministers of Australia, of which there have been 29 , in which
Paul Keating Paul John Keating (born 18 January 1944) is an Australian former politician and unionist who served as the 24th prime minister of Australia from 1991 to 1996, holding office as the leader of the Australian Labor Party (ALP). He previously serv ...
, the 24th prime minister, and
Edmund Barton Sir Edmund "Toby" Barton, (18 January 18497 January 1920) was an Australian politician and judge who served as the first prime minister of Australia from 1901 to 1903, holding office as the leader of the Protectionist Party. He resigned to b ...
, the first prime minister, share the same birthday, 18 January. In the
2014 FIFA World Cup The 2014 FIFA World Cup was the 20th FIFA World Cup, the quadrennial world championship for list of men's national association football teams, men's national Association football, football teams organised by FIFA. It took place in Brazil from ...
, each of the 32 squads had 23 players. An analysis of the official squad lists suggested that 16 squads had pairs of players sharing birthdays, and of these 5 squads had two pairs: Argentina, France, Iran, South Korea and Switzerland each had two pairs, and Australia, Bosnia and Herzegovina, Brazil, Cameroon, Colombia, Honduras, Netherlands, Nigeria, Russia, Spain and USA each with one pair. Voracek, Tran and Formann showed that the majority of people markedly overestimate the number of people that is necessary to achieve a given probability of people having the same birthday, and markedly underestimate the probability of people having the same birthday when a specific sample size is given. Further results showed that psychology students and women did better on the task than casino visitors/personnel or men, but were less confident about their estimates.


Reverse problem

The reverse problem is to find, for a fixed probability , the greatest for which the probability is smaller than the given , or the smallest for which the probability is greater than the given . Taking the above formula for , one has :n(p;365)\approx \sqrt. The following table gives some sample calculations. : Some values falling outside the bounds have been colored to show that the approximation is not always exact.


Partition problem

A related problem is the
partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbe ...
, a variant of the
knapsack problem The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit a ...
from operations research. Some weights are put on a balance scale; each weight is an integer number of grams randomly chosen between one gram and one million grams (one
tonne The tonne ( or ; symbol: t) is a unit of mass equal to 1000  kilograms. It is a non-SI unit accepted for use with SI. It is also referred to as a metric ton to distinguish it from the non-metric units of the short ton ( United State ...
). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible? Often, people's intuition is that the answer is above . Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is 23. The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are different partitions for weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponym ...
, with a peak at and width , so that when is approximately equal to the transition occurs. 223 − 1 is about 4 million, while the width of the distribution is only 5 million.


In fiction

Arthur C. Clarke Sir Arthur Charles Clarke (16 December 191719 March 2008) was an English science-fiction writer, science writer, futurist, inventor, undersea explorer, and television series host. He co-wrote the screenplay for the 1968 film '' 2001: A Spac ...
's novel ''
A Fall of Moondust ''A Fall of Moondust'' is a hard science fiction novel by British writer Arthur C. Clarke, first published in 1961. It was nominated for a Hugo Award for Best Novel, and was the first science fiction novel selected to become a ''Reader's ...
'', published in 1961, contains a section where the main characters, trapped underground for an indefinite amount of time, are celebrating a birthday and find themselves discussing the validity of the birthday problem. As stated by a physicist passenger: "If you have a group of more than twenty-four people, the odds are better than even that two of them have the same birthday." Eventually, out of 22 present, it is revealed that two characters share the same birthday, May 23.


Notes


References


Bibliography

* * * * * * * *


External links


The Birthday Paradox accounting for leap year birthdays
*
A humorous article explaining the paradox

SOCR EduMaterials activities birthday experiment

Understanding the Birthday Problem (Better Explained)


A practical football example of the birthday paradox. *
Computing the probabilities of the Birthday Problem at WolframAlpha
{{DEFAULTSORT:Birthday Problem Probability theory paradoxes Probability problems Applied probability Birthdays Mathematical problems Coincidence