Cox's theorem, named after the physicist

Richard Threlkeld Cox Richard Threlkeld Cox (August 5, 1898 – May 2, 1991) was a professor of physics at Johns Hopkins University, known for Cox's theorem relating to the foundations of probability.. Biography He was born in Portland, Oregon, the son of attorney ...

, is a derivation of the laws of

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability, as the laws of probability derived by Cox's theorem are applicable to any proposition. Logical (also known as objective Bayesian) probability is a type of

Bayesian probability Bayesian probability ( or ) is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quant ...

. Other forms of Bayesianism, such as the subjective interpretation, are given other justifications.

Cox's assumptions

Cox wanted his system to satisfy the following conditions: #Divisibility and comparability – The plausibility of a

proposition A proposition is a statement that can be either true or false. It is a central concept in the philosophy of language, semantics, logic, and related fields. Propositions are the object s denoted by declarative sentences; for example, "The sky ...

is a real number and is dependent on information we have related to the proposition. #Common sense – Plausibilities should vary sensibly with the assessment of plausibilities in the model. #Consistency – If the plausibility of a proposition can be derived in many ways, all the results must be equal. The postulates as stated here are taken from Arnborg and Sjödin.Stefan Arnborg and Gunnar Sjödin, ''On the foundations of Bayesianism,'' Preprint: Nada, KTH (1999) — http://www.stats.org.uk/cox-theorems/ArnborgSjodin2001.pdfStefan Arnborg and Gunnar Sjödin, ''A note on the foundations of Bayesianism,'' Preprint: Nada, KTH (2000a) — http://www.stats.org.uk/bayesian/ArnborgSjodin1999.pdfStefan Arnborg and Gunnar Sjödin, "Bayes rules in finite models," in ''European Conference on Artificial Intelligence,'' Berlin, (2000b) — https://frontiersinai.com/ecai/ecai2000/pdf/p0571.pdf "

Common sense Common sense () is "knowledge, judgement, and taste which is more or less universal and which is held more or less without reflection or argument". As such, it is often considered to represent the basic level of sound practical judgement or know ...

" includes consistency with Aristotelian

logic Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure o ...

in the sense that logically equivalent propositions shall have the same plausibility. The postulates as originally stated by Cox were not mathematically rigorous (although more so than the informal description above), as noted by Halpern.Joseph Y. Halpern, "A counterexample to theorems of Cox and Fine," ''Journal of AI research,'' 10, 67–85 (1999) — http://www.jair.org/media/536/live-536-2054-jair.ps.Z Joseph Y. Halpern, "Technical Addendum, Cox's theorem Revisited," ''Journal of AI research,'' 11, 429–435 (1999) — http://www.jair.org/media/644/live-644-1840-jair.ps.Z However it appears to be possible to augment them with various mathematical assumptions made either implicitly or explicitly by Cox to produce a valid proof. Cox's notation: :The plausibility of a proposition

A

given some related information

X

is denoted by

A\mid X

. Cox's postulates and functional equations are: *The plausibility of the conjunction

AB

of two propositions

A

B

, given some related information

X

, is determined by the plausibility of

A

given

X

and that of

B

given

AX

. :In form of a

functional equation In mathematics, a functional equation is, in the broadest meaning, an equation in which one or several functions appear as unknowns. So, differential equations and integral equations are functional equations. However, a more restricted meaning ...

AB\mid X=g(A\mid X,B\mid AX)

:Because of the associative nature of the conjunction in propositional logic, the consistency with logic gives a functional equation saying that the function

g

is an

associative In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement for express ...

binary operation In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, a binary operation ...

. *Additionally, Cox postulates the function

g

to be

monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of ord ...

. :All strictly increasing associative binary operations on the real numbers are isomorphic to multiplication of numbers in a subinterval of , which means that there is a monotonic function

w

mapping plausibilities to such that ::

w(AB\mid X)=w(A\mid X)w(B\mid AX)

*In case

A

given

X

is certain, we have

AB\mid X=B\mid X

and

B\mid AX=B\mid X

due to the requirement of consistency. The general equation then leads to :

w(B\mid X)=w(A\mid X)w(B\mid X)

:This shall hold for any proposition

B

, which leads to ::

w(A\mid X)=1

*In case

A

given

X

is impossible, we have

AB\mid X=A\mid X

and

A\mid BX=A\mid X

due to the requirement of consistency. The general equation (with the A and B factors switched) then leads to :

w(A\mid X)=w(B\mid X)w(A\mid X)

:This shall hold for any proposition

B

, which,

without loss of generality ''Without loss of generality'' (often abbreviated to WOLOG, WLOG or w.l.o.g.; less commonly stated as ''without any loss of generality'' or ''with no loss of generality'') is a frequently used expression in mathematics. The term is used to indicat ...

, leads to a solution ::

w(A\mid X)=0

::Due to the requirement of monotonicity, this means that

w

maps plausibilities to interval . *The plausibility of a proposition determines the plausibility of the proposition's

negation In logic, negation, also called the logical not or logical complement, is an operation (mathematics), operation that takes a Proposition (mathematics), proposition P to another proposition "not P", written \neg P, \mathord P, P^\prime or \over ...

. :This postulates the existence of a function

f

such that ::

w(\text A\mid X)=f(w(A\mid X))

:Because "a double negative is an affirmative", consistency with logic gives a functional equation ::

f(f(x))=x,

:saying that the function

f

is an

involution Involution may refer to: Mathematics * Involution (mathematics), a function that is its own inverse * Involution algebra, a *-algebra: a type of algebraic structure * Involute, a construction in the differential geometry of curves * Exponentiati ...

, i.e., it is its own inverse. *Furthermore, Cox postulates the function

f

to be monotonic. :The above functional equations and consistency with logic imply that ::

w(AB\mid X)=w(A\mid X)f(w(\textB\mid AX))=w(A\mid X)f\left(  \right)

:Since

AB

is logically equivalent to

BA

, we also get ::

w(A\mid X)f\left(  \right)=w(B\mid X)f\left(  \right)

:If, in particular,

B=\text(AD)

, then also

A\text B = \textB

and

B\textA=\textA

and we get ::

w(A\textB\mid X)=w(\textB\mid X)=f(w(B\mid X))

:and ::

w(B\textA\mid X)=w(\textA\mid X)=f(w(A\mid X))

:Abbreviating

w(A\mid X)=x

and

w(B\mid X)=y

we get the functional equation ::

x\,f\left(\right)=y\,f\left(\right)

Implications of Cox's postulates

The laws of probability derivable from these postulates are the following.

Edwin Thompson Jaynes Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statistic ...

, ''Probability Theory: The Logic of Science,'' Cambridge University Press (2003). — preprint version (1996) at ; Chapters 1 to 3 of published version at http://bayes.wustl.edu/etj/prob/book.pdf Let

A\mid B

be the plausibility of the proposition

A

given

B

satisfying Cox's postulates. Then there is a function

w

mapping plausibilities to interval ,1and a positive number

m

such that # Certainty is represented by

w(A\mid B)=1.

w^m(A, B)+w^m(\textA\mid B)=1.

w(AB\mid C)=w(A\mid C)w(B\mid AC)=w(B\mid C)w(A\mid BC).

It is important to note that the postulates imply only these general properties. We may recover the usual laws of probability by setting a new function, conventionally denoted

P

\Pr

, equal to

w^m

. Then we obtain the laws of probability in a more familiar form: # Certain truth is represented by

\Pr(A\mid B)=1

, and certain falsehood by

\Pr(A\mid B)=0.

\Pr(A\mid B)+\Pr(\textA\mid B)=1.

\Pr(AB\mid C)=\Pr(A\mid C)\Pr(B\mid AC)=\Pr(B\mid C)\Pr(A\mid BC).

Rule 2 is a rule for negation, and rule 3 is a rule for conjunction. Given that any proposition containing conjunction,

disjunction In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is ...

, and negation can be equivalently rephrased using conjunction and negation alone (the

conjunctive normal form In Boolean algebra, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; otherwise put, it is a product of sums or an AND of ORs. In au ...

), we can now handle any compound proposition. The laws thus derived yield finite additivity of probability, but not countable additivity. The measure-theoretic formulation of Kolmogorov assumes that a probability measure is countably additive. This slightly stronger condition is necessary for certain results. An elementary example (in which this assumption merely simplifies the calculation rather than being necessary for it) is that the probability of seeing heads for the first time after an even number of flips in a sequence of coin flips is

\tfrac13

Interpretation and further discussion

Cox's theorem has come to be used as one of the justifications for the use of

Bayesian probability theory Bayesian probability ( or ) is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quanti ...

. For example, in Jaynes it is discussed in detail in chapters 1 and 2 and is a cornerstone for the rest of the book. Probability is interpreted as a

formal system A formal system is an abstract structure and formalization of an axiomatic system used for deducing, using rules of inference, theorems from axioms. In 1921, David Hilbert proposed to use formal systems as the foundation of knowledge in ma ...

, the natural extension of

Aristotelian logic In logic and formal semantics, term logic, also known as traditional logic, syllogistic logic or Aristotelian logic, is a loose name for an approach to formal logic that began with Aristotle and was developed further in ancient history mostly b ...

(in which every statement is either true or false) into the realm of reasoning in the presence of uncertainty. It has been debated to what degree the theorem excludes alternative models for reasoning about

uncertainty Uncertainty or incertitude refers to situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown, and is particularly relevant for decision ...

. For example, if certain "unintuitive" mathematical assumptions were dropped then alternatives could be devised, e.g., an example provided by Halpern. However Arnborg and Sjödin suggest additional "common sense" postulates, which would allow the assumptions to be relaxed in some cases while still ruling out the Halpern example. Other approaches were devised by Hardy or Dupré and Tipler.Dupré, Maurice J. & Tipler, Frank J. (2009)
"New Axioms for Rigorous Bayesian Probability"
''Bayesian Analysis'', 4(3): 599-606. The original formulation of Cox's theorem is in , which is extended with additional results and more discussion in . Jaynes cites Abel for the first known use of the associativity functional equation. János Aczél provides a long proof of the "associativity equation" (pages 256-267). Jaynes reproduces the shorter proof by Cox in which differentiability is assumed. A guide to Cox's theorem by Van Horn aims at comprehensively introducing the reader to all these references. Baoding Liu, the founder of uncertainty theory, criticizes Cox's theorem for presuming that the

truth value In logic and mathematics, a truth value, sometimes called a logical value, is a value indicating the relation of a proposition to truth, which in classical logic has only two possible values ('' true'' or '' false''). Truth values are used in ...

of conjunction

P \land Q

is a twice differentiable function

f

of truth values of the two

propositions A proposition is a statement that can be either true or false. It is a central concept in the philosophy of language, semantics, logic, and related fields. Propositions are the object s denoted by declarative sentences; for example, "The sky ...

P

and

Q

, i.e.,

T(P \land Q) = f(T(P), T(Q))

, which excludes uncertainty theory's "uncertain measure" from its start, because the function

f(x, y) = x \land y

, used in uncertainty theory, is not differentiable with respect to

x

and

y

. According to Liu, "there does not exist any evidence that the truth value of conjunction is completely determined by the truth values of individual propositions, let alone a twice

differentiable function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...

Cox's assumptions

Implications of Cox's postulates

Interpretation and further discussion

See also

Notes

References

Further reading