The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. The principle of indifference states that in the absence of any relevant evidence, agents should distribute their credence (or "degrees of belief") equally among all the possible outcomes under consideration. In

Bayesian probability Bayesian probability ( or ) is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quant ...

, this is the simplest

non-informative prior A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

Examples

The textbook examples for the application of the principle of indifference are

coin A coin is a small object, usually round and flat, used primarily as a medium of exchange or legal tender. They are standardized in weight, and produced in large quantities at a mint in order to facilitate trade. They are most often issued by ...

dice A die (: dice, sometimes also used as ) is a small, throwable object with marked sides that can rest in multiple positions. Dice are used for generating random values, commonly as part of tabletop games, including dice games, board games, ro ...

, and

cards {{Redirect, CARDS, other uses, Cards (disambiguation){{!Cards The CARDS programme, of Community Assistance for Reconstruction, Development and Stabilisation, is the EU's main instrument of financial assistance to the Western Balkans, covering spec ...

. In a

macroscopic The macroscopic scale is the length scale on which objects or phenomena are large enough to be visible with the naked eye, without magnifying optical instruments. It is the opposite of microscopic. Overview When applied to physical phenome ...

system, at least, it must be assumed that the physical laws that govern the system are not known well enough to predict the outcome. As observed some centuries ago by

John Arbuthnot John Arbuthnot FRS (''baptised'' 29 April 1667 – 27 February 1735), often known simply as Dr Arbuthnot, was a Scottish physician, satirist and polymath in London. He is best remembered for his contributions to mathematics, his membership ...

(in the preface of ''Of the Laws of Chance'', 1692), :It is impossible for a Die, with such determin'd force and direction, not to fall on such determin'd side, only I don't know the force and direction which makes it fall on such determin'd side, and therefore I call it Chance, which is nothing but the want of art.... Given enough time and resources, there is no fundamental reason to suppose that suitably precise measurements could not be made, which would enable the prediction of the outcome of coins, dice, and cards with high accuracy:

Persi Diaconis Persi Warren Diaconis (; born January 31, 1945) is an American mathematician of Greek descent and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University. He is particularly known f ...

's work with

coin-flipping Coin flipping, coin tossing, or heads or tails is using the thumb to make a coin go up while spinning in the air and checking obverse and reverse, which side is showing when it is down onto a surface, in order to randomly choose between two alter ...

machines is a practical example of this.

Coins

symmetric Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is invariant under some transformations ...

coin has two sides, arbitrarily labeled ''heads'' (many coins have the head of a person portrayed on one side) and ''tails''. Assuming that the coin must land on one side or the other, the outcomes of a coin toss are mutually exclusive, exhaustive, and interchangeable. According to the principle of indifference, we assign each of the possible outcomes a probability of 1/2. It is implicit in this analysis that the forces acting on the coin are not known with any precision. If the momentum imparted to the coin as it is launched were known with sufficient accuracy, the flight of the coin could be predicted according to the laws of mechanics. Thus the uncertainty in the outcome of a coin toss is derived (for the most part) from the uncertainty with respect to initial conditions. This point is discussed at greater length in the article on

coin flipping Coin flipping, coin tossing, or heads or tails is using the thumb to make a coin go up while spinning in the air and checking which side is showing when it is down onto a surface, in order to randomly choose between two alternatives. It is a for ...

Dice

die has ''n'' faces, arbitrarily labeled from 1 to ''n''. An ordinary cubical die has ''n'' = 6 faces, although a symmetric die with different numbers of faces can be constructed; see

Dice A die (: dice, sometimes also used as ) is a small, throwable object with marked sides that can rest in multiple positions. Dice are used for generating random values, commonly as part of tabletop games, including dice games, board games, ro ...

. We assume that the die will land with one face or another upward, and there are no other possible outcomes. Applying the principle of indifference, we assign each of the possible outcomes a probability of 1/''n''. As with coins, it is assumed that the initial conditions of throwing the dice are not known with enough precision to predict the outcome according to the laws of mechanics. Dice are typically thrown so as to bounce on a table or other surface(s). This interaction makes prediction of the outcome much more difficult. The assumption of symmetry is crucial here. Suppose that we are asked to bet for or against the outcome "6". We might reason that there are two relevant outcomes here "6" or "not 6", and that these are mutually exclusive and exhaustive. A common fallacy is assigning the probability 1/2 to each of the two outcomes, when "not 6" is five times more likely than "6."

Cards

A standard deck contains 52 cards, each given a unique label in an arbitrary fashion, i.e. arbitrarily ordered. We draw a card from the deck; applying the principle of indifference, we assign each of the possible outcomes a probability of 1/52. This example, more than the others, shows the difficulty of actually applying the principle of indifference in real situations. What we really mean by the phrase "arbitrarily ordered" is simply that we don't have any information that would lead us to favor a particular card. In actual practice, this is rarely the case: a new deck of cards is certainly not in arbitrary order, and neither is a deck immediately after a hand of cards. In practice, we therefore

shuffle Shuffling is a technique used to randomize a deck of playing cards, introducing an element of chance into card games. Various shuffling methods exist, each with its own characteristics and potential for manipulation. One of the simplest shuff ...

the cards; this does not destroy the information we have, but instead (hopefully) renders our information practically unusable, although it is still usable in principle. In fact, some expert blackjack players can track aces through the deck; for them, the condition for applying the principle of indifference is not satisfied.

Application to continuous variables

Applying the principle of indifference incorrectly can easily lead to nonsensical results, especially in the case of multivariate, continuous variables. A typical case of misuse is the following example: *Suppose there is a cube hidden in a box. A label on the box says the cube has a side length between 3 and 5 cm. *We don't know the actual side length, but we might assume that all values are equally likely and simply pick the mid-value of 4 cm. *The information on the label allows us to calculate that the surface area of the cube is between 54 and 150 cm². We don't know the actual surface area, but we might assume that all values are equally likely and simply pick the mid-value of 102 cm². *The information on the label allows us to calculate that the volume of the cube is between 27 and 125 cm³. We don't know the actual volume, but we might assume that all values are equally likely and simply pick the mid-value of 76 cm³. *However, we have now reached the impossible conclusion that the cube has a side length of 4 cm, a surface area of 102 cm², and a volume of 76 cm³! In this example, mutually contradictory estimates of the length, surface area, and volume of the cube arise because we have assumed three mutually contradictory distributions for these parameters: a uniform distribution for any one of the variables implies a non-uniform distribution for the other two. In general, the principle of indifference does not indicate which variable (e.g. in this case, length, surface area, or volume) is to have a uniform epistemic

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

. Another classic example of this kind of misuse is the Bertrand paradox. Edwin T. Jaynes introduced the

principle of transformation groups The principle of transformation groups is a methodology for assigning prior probabilities in statistical inference issues, initially proposed by physicist E. T. Jaynes. It is regarded as an extension of the principle of indifference. Prior proba ...

, which can yield an epistemic probability distribution for this problem. This generalises the principle of indifference, by saying that one is indifferent between ''equivalent problems'' rather than indifferent between propositions. This still reduces to the ordinary principle of indifference when one considers a permutation of the labels as generating equivalent problems (i.e. using the permutation transformation group). To apply this to the above box example, we have three random variables related by geometric equations. If we have no reason to favour one trio of values over another, then our prior probabilities must be related by the rule for changing variables in continuous distributions. Let ''L'' be the length, and ''V'' be the volume. Then we must have :

f_L(L) = \left, \f_V(V)=3 L^ f_V(L^)

, where

f_L,\,f_V

are the

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s (pdf) of the stated variables. This equation has a general solution:

f(L) =

, where ''K'' is a normalization constant, determined by the range of ''L'', in this case equal to: :

K^=\int_^ = \log\left(\right)

To put this "to the test", we ask for the probability that the length is less than 4. This has probability of: :

Pr(L<4)=\int_^=  \approx 0.56

. For the volume, this should be equal to the probability that the volume is less than 4³ = 64. The pdf of the volume is :

f(V^)  V^=

. And then probability of volume less than 64 is :

Pr(V<64)=\int_^ \approx 0.56

. Thus we have achieved invariance with respect to volume and length. One can also show the same invariance with respect to surface area being less than 6(4²) = 96. However, note that this probability assignment is not necessarily a "correct" one. For the exact distribution of lengths, volume, or surface area will depend on how the "experiment" is conducted. The fundamental hypothesis of

statistical physics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...

, that any two microstates of a system with the same total energy are equally probable at

equilibrium Equilibrium may refer to: Film and television * ''Equilibrium'' (film), a 2002 science fiction film * '' The Story of Three Loves'', also known as ''Equilibrium'', a 1953 romantic anthology film * "Equilibrium" (''seaQuest 2032'') * ''Equilibr ...

, is in a sense an example of the principle of indifference. However, when the microstates are described by continuous variables (such as positions and momenta), an additional physical basis is needed in order to explain under ''which'' parameterization the probability density will be uniform. Liouville's theorem justifies the use of canonically

conjugate variable Conjugate variables are pairs of variables mathematically defined in such a way that they become Fourier transform duals, or more generally are related through Pontryagin duality. The duality relations lead naturally to an uncertainty relation—i ...

s, such as positions and their conjugate momenta. The wine/water paradox shows a dilemma with linked variables, and which one to choose.

History

This principle stems from

Epicurus Epicurus (, ; ; 341–270 BC) was an Greek philosophy, ancient Greek philosopher who founded Epicureanism, a highly influential school of philosophy that asserted that philosophy's purpose is to attain as well as to help others attain tranqui ...

' principle of "multiple explanations" (pleonachos tropos), according to which "if more than one theory is consistent with the data, keep them all”. The epicurean

Lucretius Titus Lucretius Carus ( ; ; – October 15, 55 BC) was a Roman poet and philosopher. His only known work is the philosophical poem '' De rerum natura'', a didactic work about the tenets and philosophy of Epicureanism, which usually is t ...

developed this point with an analogy of the multiple causes of death of a corpse. The original writers on probability, primarily

Jacob Bernoulli Jacob Bernoulli (also known as James in English or Jacques in French; – 16 August 1705) was a Swiss mathematician. He sided with Gottfried Wilhelm Leibniz during the Leibniz–Newton calculus controversy and was an early proponent of Leibniz ...

and

Pierre Simon Laplace Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...

, considered the principle of indifference to be intuitively obvious and did not even bother to give it a name. Laplace wrote: :The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible. These earlier writers, Laplace in particular, naively generalized the principle of indifference to the case of continuous parameters, giving the so-called "uniform prior probability distribution", a function that is constant over all real numbers. He used this function to express a complete lack of knowledge as to the value of a parameter. According to Stigler (page 135), Laplace's assumption of uniform prior probabilities was not a meta-physical assumption. It was an implicit assumption made for the ease of analysis. The principle of insufficient reason was its first name, given to it by Johannes von Kries, possibly as a play on

Leibniz Gottfried Wilhelm Leibniz (or Leibnitz; – 14 November 1716) was a German polymath active as a mathematician, philosopher, scientist and diplomat who is credited, alongside Sir Isaac Newton, with the creation of calculus in addition to many ...

principle of sufficient reason The principle of sufficient reason states that everything must have a Reason (argument), reason or a cause. The principle was articulated and made prominent by Gottfried Wilhelm Leibniz, with many antecedents, and was further used and developed by ...

. These later writers (

George Boole George Boole ( ; 2 November 1815 – 8 December 1864) was a largely self-taught English mathematician, philosopher and logician, most of whose short career was spent as the first professor of mathematics at Queen's College, Cork in Ireland. H ...

John Venn John Venn, Fellow of the Royal Society, FRS, Fellow of the Society of Antiquaries of London, FSA (4 August 1834 – 4 April 1923) was an English mathematician, logician and philosopher noted for introducing Venn diagrams, which are used in l ...

, and others) objected to the use of the uniform prior for two reasons. The first reason is that the constant function is not normalizable, and thus is not a proper probability distribution. The second reason is its inapplicability to continuous variables, as described above. The "principle of insufficient reason" was renamed the "principle of indifference" by , who was careful to note that it applies only when there is no knowledge indicating unequal probabilities. Attempts to put the notion on firmer

philosophical Philosophy ('love of wisdom' in Ancient Greek) is a systematic study of general and fundamental questions concerning topics like existence, reason, knowledge, Value (ethics and social sciences), value, mind, and language. It is a rational an ...

ground have generally begun with the concept of equipossibility and progressed from it to equiprobability. The principle of indifference can be given a deeper logical justification by noting that equivalent states of knowledge should be assigned equivalent epistemic probabilities. This argument was propounded by

Edwin Thompson Jaynes Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statistic ...

: it leads to two generalizations, namely the

as in the

Jeffreys prior In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matri ...

, and the

principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

. More generally, one speaks of

uninformative prior A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

References

{{Decision theory Probability theory Indifference Decision theory