probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, the Borel–Kolmogorov paradox (sometimes known as Borel's paradox) is a

paradox A paradox is a logically self-contradictory statement or a statement that runs contrary to one's expectation. It is a statement that, despite apparently valid reasoning from true or apparently true premises, leads to a seemingly self-contradictor ...

relating to

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

with respect to an

event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of eve ...

of probability zero (also known as a

null set In mathematical analysis, a null set is a Lebesgue measurable set of real numbers that has measure zero. This can be characterized as a set that can be covered by a countable union of intervals of arbitrarily small total length. The notio ...

). It is named after

Émile Borel Félix Édouard Justin Émile Borel (; 7 January 1871 – 3 February 1956) was a French people, French mathematician and politician. As a mathematician, he was known for his founding work in the areas of measure theory and probability. Biograp ...

and

Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Soviet ...

A great circle puzzle

Suppose that a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

has a uniform distribution on a

unit sphere In mathematics, a unit sphere is a sphere of unit radius: the locus (mathematics), set of points at Euclidean distance 1 from some center (geometry), center point in three-dimensional space. More generally, the ''unit -sphere'' is an n-sphere, -s ...

. What is its

conditional distribution Conditional (if then) may refer to: * Causal conditional, if X then Y, where X is a cause of Y *Conditional probability, the probability of an event A given that another event B * Conditional proof, in logic: a proof that asserts a conditional, ...

on a

great circle In mathematics, a great circle or orthodrome is the circular intersection of a sphere and a plane passing through the sphere's center point. Discussion Any arc of a great circle is a geodesic of the sphere, so that great circles in spher ...

? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results. First, note that choosing a point uniformly on the sphere is equivalent to choosing the

longitude Longitude (, ) is a geographic coordinate that specifies the east- west position of a point on the surface of the Earth, or another celestial body. It is an angular measurement, usually expressed in degrees and denoted by the Greek lett ...

\lambda

uniformly from

\pi,\pi /math> and choosing the

latitude In geography, latitude is a geographic coordinate system, geographic coordinate that specifies the north-south position of a point on the surface of the Earth or another celestial body. Latitude is given as an angle that ranges from −90° at t ...

\varphi

from

\frac,\frac /math> with density \frac \cos \varphi . Then we can look at two different great circles:
# If the coordinates are chosen so that the great circle is an

equator The equator is the circle of latitude that divides Earth into the Northern Hemisphere, Northern and Southern Hemisphere, Southern Hemispheres of Earth, hemispheres. It is an imaginary line located at 0 degrees latitude, about in circumferen ...

(latitude

\varphi = 0

), the conditional density for a longitude

\lambda

defined on the interval

\pi,\pi /math> is f(\lambda\mid\varphi=0) = \frac. # If the great circle is a line of longitude with \lambda = 0, the conditional density for \varphi on the interval \frac,\frac /math> is f(\varphi\mid\lambda=0) = \frac \cos \varphi. One distribution is uniform on the circle, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.

Explanation and implications

In case (1) above, the conditional probability that the longitude ''λ'' lies in a set ''E'' given that ''φ'' = 0 can be written ''P''(''λ'' ∈ ''E'' , ''φ'' = 0). Elementary probability theory suggests this can be computed as ''P''(''λ'' ∈ ''E'' and ''φ'' = 0)/''P''(''φ'' = 0), but that expression is not well-defined since ''P''(''φ'' = 0) = 0.

Measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...

provides a way to define a conditional probability, using the limit of events ''R''_''ab'' = which are horizontal rings (curved surface zones of

spherical segment In geometry, a spherical segment is the solid defined by cutting a sphere or a ball with a pair of parallel planes. It can be thought of as a spherical cap with the top truncated, and so it corresponds to a spherical frustum. The surface o ...

s) consisting of all points with latitude between ''a'' and ''b''. The resolution of the paradox is to notice that in case (2), ''P''(''φ'' ∈ ''F'' , ''λ'' = 0) is defined using a limit of the events ''L''_''cd'' = , which are lunes (vertical wedges), consisting of all points whose longitude varies between ''c'' and ''d''. So although ''P''(''λ'' ∈ ''E'' , ''φ'' = 0) and ''P''(''φ'' ∈ ''F'' , ''λ'' = 0) each provide a probability distribution on a great circle, one of them is defined using limits of rings, and the other using limits of lunes. Since rings and lunes have different shapes, it should be less surprising that ''P''(''λ'' ∈ ''E'' , ''φ'' = 0) and ''P''(''φ'' ∈ ''F'' , ''λ'' = 0) have different distributions.

Mathematical explication

Measure theoretic perspective

To understand the problem we need to recognize that a distribution on a continuous random variable is described by a density ''f'' only with respect to some measure ''μ''. Both are important for the full description of the probability distribution. Or, equivalently, we need to fully define the space on which we want to define ''f''. Let Φ and Λ denote two random variables taking values in Ω₁ =

\left \frac, \frac\right /math> respectively Ω

₂ = ��, An event gives a point on the sphere ''S''(''r'') with radius ''r''. We define the coordinate transform :

\begin
  x &= r \cos \varphi \cos \lambda \\
  y &= r \cos \varphi \sin \lambda \\
  z &= r \sin \varphi
\end

for which we obtain the

volume element In mathematics, a volume element provides a means for integrating a function with respect to volume in various coordinate systems such as spherical coordinates and cylindrical coordinates. Thus a volume element is an expression of the form \ma ...

\omega_r(\varphi,\lambda) = \left\,   \times  \right\,  = r^2 \cos \varphi \ .

Furthermore, if either ''φ'' or ''λ'' is fixed, we get the volume elements :

\omega_r(\varphi) &= \left\, \right\, = r \cos \varphi\ . \end

Let :

\mu_(d\varphi, d\lambda) = f_(\varphi,\lambda) \omega_r(\varphi,\lambda) \, d\varphi \, d\lambda

denote the joint measure on

\mathcal(\Omega_1 \times \Omega_2)

, which has a density

f_

with respect to

\omega_r(\varphi,\lambda) \, d\varphi \, d\lambda

and let :

\begin
          \mu_\Phi(d\varphi) &= \int_ \mu_(d\varphi, d\lambda)\ ,\\
  \mu_\Lambda (d\lambda) &= \int_ \mu_(d\varphi, d\lambda)\ .
\end

If we assume that the density

f_

is uniform, then :

\mu_(d\lambda \mid \varphi) &= = \frac \omega_r(\lambda) \, d\lambda \ . \end

Hence,

\mu_

has a uniform density with respect to

\omega_r(\varphi) \, d\varphi

but not with respect to the

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...

. On the other hand,

\mu_

has a uniform density with respect to

\omega_r(\lambda) \, d\lambda

and the Lebesgue measure.

Proof of contradiction

Consider a random vector

(X,Y,Z)

that is uniformly distributed on the unit sphere

S^2

. We begin by parametrizing the sphere with the usual

spherical polar coordinates In mathematics, a spherical coordinate system specifies a given point in three-dimensional space by using a distance and two angles as its three coordinates. These are * the radial distance along the line connecting the point to a fixed point ...

: :

\begin
  x &= \cos(\varphi) \cos (\theta) \\
  y &= \cos(\varphi) \sin (\theta) \\
  z &= \sin(\varphi)
\end

where

-\frac \le \varphi \le \frac

and

-\pi \le \theta \le \pi

. We can define random variables

\Phi

\Theta

as the values of

(X, Y, Z)

under the inverse of this parametrization, or more formally using the arctan2 function: :

\begin
    \Phi &= \arcsin(Z) \\
  \Theta &= \arctan_2\left(\frac, \frac\right)
\end

Using the formulas for the surface area

spherical cap In geometry, a spherical cap or spherical dome is a portion of a sphere or of a ball (mathematics), ball cut off by a plane (mathematics), plane. It is also a spherical segment of one base, i.e., bounded by a single plane. If the plane passes thr ...

and the

spherical wedge A sphere (from Greek , ) is a surface analogous to the circle, a curve. In solid geometry, a sphere is the set of points that are all at the same distance from a given point in three-dimensional space.. That given point is the ''center' ...

, the surface of a spherical cap wedge is given by :

\operatorname(\Theta \le \theta, \Phi \le \varphi) = (1 + \sin(\varphi)) (\theta + \pi)

Since

(X,Y,Z)

is uniformly distributed, the probability is proportional to the surface area, giving the joint cumulative distribution function :

F_(\varphi, \theta) = P(\Theta \le \theta, \Phi \le \varphi) = \frac(1 + \sin(\varphi)) (\theta + \pi)

The

joint probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values ...

is then given by :

f_(\varphi, \theta) = 
  \frac F_(\varphi, \theta) = 
  \frac \cos(\varphi)

Note that

\Phi

and

\Theta

are independent random variables. For simplicity, we won't calculate the full conditional distribution on a great circle, only the probability that the random vector lies in the first octant. That is to say, we will attempt to calculate the conditional probability

\mathbb(A, B)

with :

\begin
  A &= \left\ &&= \\\
  B &= \ &&= \
\end

We attempt to evaluate the conditional probability as a limit of conditioning on the events :

B_\varepsilon = \

\Phi

and

\Theta

are independent, so are the events

A

and

B_\varepsilon

, therefore :

P(A \mid B) \mathrel \lim_ \frac =
  \lim_ P(A) = P \left(0 < \Theta < \frac\right) = \frac.

Now we repeat the process with a different parametrization of the sphere: :

\begin
  x &=  \sin(\varphi) \\
  y &=  \cos(\varphi) \sin(\theta) \\
  z &= -\cos(\varphi) \cos(\theta)
\end

This is equivalent to the previous parametrization rotated by 90 degrees around the y axis. Define new random variables :

\begin
    \Phi' &= \arcsin(X) \\
  \Theta' &= \arctan_2\left(\frac, \frac\right).
\end

Rotation is measure preserving so the density of

\Phi'

and

\Theta'

is the same: :

f_(\varphi, \theta) = \frac \cos(\varphi)

. The expressions for and are: :

\begin
  A &= \left\
   &&= \
   &&= \left\ \\
  B &= \
   &&= \
   &&= \left\ \cup \left\.
\end

Attempting again to evaluate the conditional probability as a limit of conditioning on the events :

B^\prime_\varepsilon = \left\ \cup \left\.

Using

L'Hôpital's rule L'Hôpital's rule (, ), also known as Bernoulli's rule, is a mathematical theorem that allows evaluating limits of indeterminate forms using derivatives. Application (or repeated application) of the rule often converts an indeterminate form ...

and differentiation under the integral sign: :

\begin
  P(A \mid B) &\mathrel \lim_ \frac\\
    &=  \lim_ \fracP\left( \frac - \varepsilon < \Theta' < \frac + \varepsilon,\ 0 < \Phi' < \frac,\ \sin(\Theta') < \tan(\Phi') \right)\\
    &= \frac \lim_ \frac \int_^ \int_0^ 1_ f_(\varphi, \theta) \mathrm\varphi \mathrm\theta \\
    &= \pi \int_0^ 1_ f_\left(\varphi, \frac\right) \mathrm\varphi \\
    &= \pi \int_^ \frac \cos(\varphi) \mathrm\varphi \\
    &= \frac \left( 1  - \frac \right) \neq \frac
\end

This shows that the conditional density cannot be treated as conditioning on an event of probability zero, as explained in Conditional probability#Conditioning on an event of probability zero.

Notes

References

* *
Fragmentary Edition (1994) (pp. 1514–1517)
(

PostScript PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...

format) * ** Translation: * * * {{DEFAULTSORT:Borel-Kolmogorov Paradox Probability theory paradoxes