An ''a priori'' probability is a probability that is derived purely by

deductive reasoning Deductive reasoning is the mental process of drawing deductive inferences. An inference is deductively valid if its conclusion follows logically from its premises, i.e. if it is impossible for the premises to be true and the conclusion to be false ...

. One way of deriving ''a priori'' probabilities is the

principle of indifference The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. The principle of indifference states that in the absence of any relevant evidence, agents should distribute their cre ...

, which has the character of saying that, if there are ''N''

mutually exclusive In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...

and

collectively exhaustive In probability theory and logic, a set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the events 1, 2, 3, 4, 5, and 6 balls of a single outcome are collec ...

events and if they are equally likely, then the probability of a given

event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of eve ...

occurring is 1/''N''. Similarly the probability of one of a given collection of ''K'' events is ''K'' / ''N''. One disadvantage of defining probabilities in the above way is that it applies only to finite collections of events. In

Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

, " uninformative priors" or "objective priors" are particular choices of ''a priori'' probabilities. Note that "

prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...

" is a broader concept. Similar to the distinction in philosophy between

a priori and a posteriori ("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...

, in Bayesian inference ''a priori'' denotes general knowledge about the data distribution before making an inference, while ''a posteriori'' denotes knowledge that incorporates the results of making an inference.

A priori probability in statistical mechanics

The a priori probability has an important application in

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic b ...

. The classical version is defined as the ratio of the number of elementary events (e.g. the number of times a die is thrown) to the total number of events - and these considered purely deductively, i.e. without any experimenting. In the case of the die if we look at it on the table without throwing it, each elementary event is reasoned deductively to have the same probability — thus the probability of each outcome of an imaginary throwing of the (perfect) die or simply by counting the number of faces is 1/6. Each face of the die appears with equal probability — probability being a measure defined for each elementary event. The result is different if we throw the die twenty times and ask how many times (out of 20) the number 6 appears on the upper face. In this case time comes into play and we have a different type of probability depending on time or the number of times the die is thrown. On the other hand, the a priori probability is independent of time - you can look at the die on the table as long as you like without touching it and you deduce the probability for the number 6 to appear on the upper face is 1/6. In statistical mechanics, e.g. that of a gas contained in a finite volume

V

, both the spatial coordinates

q_i

and the momentum coordinates

p_i

of the individual gas elements (atoms or molecules) are finite in the phase space spanned by these coordinates. In analogy to the case of the die, the a priori probability is here (in the case of a continuum) proportional to the phase space volume element

\Delta q\Delta p

divided by

h

, and is the number of standing waves (i.e. states) therein, where

\Delta q

is the range of the variable

q

and

\Delta p

is the range of the variable

p

(here for simplicity considered in one dimension). In 1 dimension (length

L

) this number or statistical weight or a priori weighting is

L \Delta p/h

. In customary 3 dimensions (volume

V

) the corresponding number can be calculated to be

V 4\pi p^2\Delta p/h^3

. In order to understand this quantity as giving a number of states in quantum (i.e. wave) mechanics, recall that in quantum mechanics every particle is associated with a matter wave which is the solution of a Schrödinger equation. In the case of free particles (of energy

\epsilon = ^2/2m

) like those of a gas in a box of volume

V = L^3

such a matter wave is explicitly :

\psi \propto \sin(l\pi x/L)\sin(m\pi y/L)\sin(n\pi z/L)

, where

l, m, n

are integers. The number of different

(l,m,n)

values and hence states in the region between

p, p+dp, p^2 = ^2,

is then found to be the above expression

V4\pi p^2dp/h^3

by considering the area covered by these points. Moreover, in view of the uncertainty relation, which in 1 spatial dimension is :

\Delta q\Delta p \geq h

, these states are indistinguishable (i.e. these states do not carry labels). An important consequence is a result known as Liouville's theorem, i.e. the time independence of this phase space volume element and thus of the a priori probability. A time dependence of this quantity would imply known information about the dynamics of the system, and hence would not be an a priori probability.A. Ben-Naim, Entropy Demystified, World Scientific (Singapore, 2007) Thus the region :

\Omega:=\frac,\;\;\;  \int \Delta q\Delta p = const.,

when differentiated with respect to time

t

yields zero (with the help of Hamilton's equations): The volume at time

t

is the same as at time zero. One describes this also as conservation of information. In the full quantum theory one has an analogous conservation law. In this case, the phase space region is replaced by a subspace of the space of states expressed in terms of a projection operator

P

, and instead of the probability in phase space, one has the probability density :

\Sigma: = \frac,\;\;\; N = \text(P) = const.,

where

N

is the dimensionality of the subspace. The conservation law in this case is expressed by the unitarity of the

S-matrix In physics, the ''S''-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT). More forma ...

. In either case, the considerations assume a closed isolated system. This closed isolated system is a system with (1) a fixed energy

E

and (2) a fixed number of particles

N

in (c) a state of equilibrium. If one considers a huge number of replicas of this system, one obtains what is called a ``microcanonical ensemble´´. It is for this system that one postulates in quantum statistics the ``fundamental postulate of equal a priori probabilities of an isolated system´´. This says that the isolated system in equilibrium occupies each of its accessible states with the same probability. This fundamental postulate therefore allows us to equate the a priori probability to the degeneracy of a system, i.e. to the number of different states with the same energy.

Example

The following example illustrates the a priori probability (or a priori weighting) in (a) classical and (b) quantal contexts. (a) Classical a priori probability Consider the rotational energy E of a diatomic molecule with moment of inertia I in spherical polar coordinates

\theta, \phi

(this means

q

above is here

\theta, \phi

), i.e. :

E = \frac\left(p^2_ + \frac\right).

The

(p_, p_)

-curve for constant E and

\theta

is an ellipse of area :

\oint dp_dp_ = \pi \sqrt\sqrt\sin\theta = 2\pi IE\sin\theta

. By integrating over

\theta

and

\phi

the total volume of phase space covered for constant energy E is :

\int^_\int^_0 2I\pi E\sin\theta d\theta d\phi = 8\pi^2 IE = \oint dp_dp_d\theta d\phi

, and hence the classical a priori weighting in the energy range

dE

is :

\Omega \propto

(phase space volume at

E+dE

) minus (phase space volume at

E

) is given by

8^2 I dE.

(b) Quantum a priori probability Assuming that the number of quantum states in a range

\Delta q \Delta p

for each direction of motion is given, per element, by a factor

\Delta q\Delta p/h

, the number of states in the energy range dE is, as seen under (a)

8\pi^2I dE/h^2

for the rotating diatomic molecule. From wave mechanics it is known that the energy levels of a rotating diatomic molecule are given by :

E_n = \frac,

each such level being (2n+1)-fold degenerate. By evaluating

dn/dE_n=1/(dE_n/dn)

one obtains :

\frac = \frac, \;\;\; (2n+1)dn=\fracdE_n.

Thus by comparison with

\Omega

above, one finds that the approximate number of states in the range dE is given by the degeneracy, i.e. :

\Sigma \propto (2n+1)dn.

Thus the a priori weighting in the classical context (a) corresponds to the a priori weighting here in the quantal context (b). In the case of the one-dimensional simple harmonic oscillator of natural frequency

\nu

one finds correspondingly: (a)

\Omega \propto dE/\nu

, and (b)

\Sigma \propto dn

(no degeneracy). Thus in quantum mechanics the a priori probability is effectively a measure of the degeneracy, i.e. the number of states having the same energy. In the case of the hydrogen atom or Coulomb potential (where the evaluation of the phase space volume for constant energy is more complicated) one knows that the quantum mechanical degeneracy is

n^2

with

E\propto 1/n^2

. Thus in this case

\Sigma \propto n^2 dn

A priori probability and distribution functions

In statistical mechanics (see any book) one derives the so-called distribution functions

f

for various statistics. In the case of

Fermi–Dirac statistics Fermi–Dirac statistics (F–D statistics) is a type of quantum statistics that applies to the physics of a system consisting of many non-interacting, identical particles that obey the Pauli exclusion principle. A result is the Fermi–Dira ...

and

Bose–Einstein statistics In quantum statistics, Bose–Einstein statistics (B–E statistics) describes one of two possible ways in which a collection of non-interacting, indistinguishable particles may occupy a set of available discrete energy states at thermodynamic eq ...

these functions are respectively :

f^_i = \frac, \quad f^_i = \frac.

These functions are derived for (1) a system in dynamic equilibrium (i.e. under steady, uniform conditions) with (2) total (and huge) number of particles

N = \Sigma_in_i

(this condition determines the constant

\epsilon_0

), and (3) total energy

E = \Sigma_in_i\epsilon_i

, i.e. with each of the

n_i

particles having the energy

\epsilon_i

. An important aspect in the derivation is the taking into account of the indistinguishability of particles and states in quantum statistics, i.e. there particles and states do not have labels. In the case of fermions, like electrons, obeying the Pauli principle (only one particle per state or none allowed), one has therefore :

0 \leq f^_i \leq 1, \quad  whereas \quad 0 \leq f^_i \leq \infty.

Thus

f^_i

is a measure of the fraction of states actually occupied by electrons at energy

\epsilon_i

and temperature

T

. On the other hand, the a priori probability

g_i

is a measure of the number of wave mechanical states available. Hence :

n_i = f_ig_i.

Since

n_i

is constant under uniform conditions (as many particles as flow out of a volume element also flow in steadily, so that the situation in the element appears static), i.e. independent of time

t

, and

g_i

is also independent of time

t

as shown earlier, we obtain :

\frac = 0, \quad f_i = f_i(t, _i, _i).

Expressing this equation in terms of its partial derivatives, one obtains the Boltzmann transport equation. How do coordinates

{\bf r}

etc. appear here suddenly? Above no mention was made of electric or other fields. Thus with no such fields present we have the Fermi-Dirac distribution as above. But with such fields present we have this additional dependence of

f

References

Probability assessment Bayesian statistics A priori Deductive reasoning