probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

X

and

Y

, the conditional probability distribution of

Y

given

X

is the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

Y

when

X

is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value

x

X

as a parameter. When both

X

and

Y

are categorical variables, a

conditional probability table In statistics, the conditional probability table (CPT) is defined for a set of discrete and mutually dependent random variables to display conditional probabilities of a single variable with respect to the others (i.e., the probability of each ...

is typically used to represent the conditional probability. The conditional distribution contrasts with the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...

of a random variable, which is its distribution without reference to the value of the other variable. If the conditional distribution of

Y

given

X

is a continuous distribution, then its

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

is known as the conditional density function. The properties of a conditional distribution, such as the moments, are often referred to by corresponding names such as the conditional mean and conditional variance. More generally, one can refer to the conditional distribution of a subset of a set of more than two variables; this conditional distribution is contingent on the values of all the remaining variables, and if more than one variable is included in the subset then this conditional distribution is the conditional joint distribution of the included variables.

Conditional discrete distributions

For discrete random variables, the conditional probability mass function of

Y

given

X=x

can be written according to its definition as: Due to the occurrence of

P(X=x)

in the denominator, this is defined only for non-zero (hence strictly positive)

P(X=x).

The relation with the probability distribution of

X

given

Y

is: :

P(Y=y \mid X=x) P(X=x) = P(\ \cap \) = P(X=x \mid Y=y)P(Y=y).

Example

Consider the roll of a fair die and let

X=1

if the number is even (i.e., 2, 4, or 6) and

X=0

otherwise. Furthermore, let

Y=1

if the number is prime (i.e., 2, 3, or 5) and

Y=0

otherwise. Then the unconditional probability that

X=1

is 3/6 = 1/2 (since there are six possible rolls of the dice, of which three are even), whereas the probability that

X=1

conditional on

Y=1

is 1/3 (since there are three possible prime number rolls—2, 3, and 5—of which one is even).

Conditional continuous distributions

Similarly for continuous random variables, the conditional

Y

given the occurrence of the value

x

X

can be written as where

f_(x,y)

gives the joint density of

X

and

Y

, while

f_X(x)

gives the marginal density for

X

. Also in this case it is necessary that

f_X(x)>0

. The relation with the probability distribution of

X

given

Y

is given by: :

f_(y \mid x)f_X(x) = f_(x, y) = f_(x \mid y)f_Y(y).

The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.

Example

The graph shows a bivariate normal joint density for random variables

X

and

Y

. To see the distribution of

Y

conditional on

X=70

, one can first visualize the line

X=70

in the

X,Y

plane, and then visualize the plane containing that line and perpendicular to the

X,Y

plane. The intersection of that plane with the joint normal density, once rescaled to give unit area under the intersection, is the relevant conditional density of

Y

Y\mid X=70 \ \sim\ \mathcal\left(\mu_Y+\frac\rho( 70 - \mu_X),\, (1-\rho^2)\sigma_Y^2\right).

Relation to independence

Random variables

X

Y

are independent if and only if the conditional distribution of

Y

given

X

is, for all possible realizations of

X

, equal to the unconditional distribution of

Y

. For discrete random variables this means

P(Y=y, X=x) = P(Y=y)

for all possible

y

and

x

with

P(X=x)>0

. For continuous random variables

X

and

Y

, having a joint density function, it means

f_Y(y, X=x) = f_Y(y)

for all possible

y

and

x

with

f_X(x)>0

Properties

Seen as a function of

y

for given

x

P(Y=y, X=x)

is a probability mass function and so the sum over all

y

(or integral if it is a conditional probability density) is 1. Seen as a function of

x

for given

y

, it is a

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

, so that the sum (or integral) over all

x

need not be 1. Additionally, a marginal of a joint distribution can be expressed as the expectation of the corresponding conditional distribution. For instance,

p_X(x) = E_\ Y)

Measure-theoretic formulation

Let

(\Omega, \mathcal, P)

be a probability space,

\mathcal \subseteq \mathcal

\sigma

-field in

\mathcal

. Given

A\in \mathcal

, the Radon-Nikodym theorem implies that there is a

\mathcal

-measurable random variable

P(A\mid\mathcal):\Omega\to \mathbb

, called the

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

, such that

\int_G P(A\mid\mathcal)(\omega) dP(\omega)=P(A\cap G)

for every

G\in \mathcal

, and such a random variable is uniquely defined up to sets of probability zero. A conditional probability is called regular if

\operatorname(\cdot\mid\mathcal)(\omega)

is a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

(\Omega, \mathcal)

for all

\omega \in \Omega

a.e. Special cases: * For the trivial sigma algebra

\mathcal G= \

, the conditional probability is the constant function

\operatorname\!\left( A\mid \ \right) = \operatorname(A).

* If

A\in \mathcal

, then

\operatorname(A\mid\mathcal)=1_A

, the indicator function (defined below). Let

X : \Omega \to E

be a

(E, \mathcal)

-valued random variable. For each

B \in \mathcal

, define

\mu_ (B \, , \, \mathcal) = \mathrm (X^(B) \, ,  \, \mathcal).

For any

\omega \in \Omega

, the function

\mu_(\cdot \, ,  \mathcal) (\omega) : \mathcal \to \mathbb

is called the conditional probability distribution of

X

given

\mathcal

. If it is a probability measure on

(E, \mathcal)

, then it is called regular. For a real-valued random variable (with respect to the Borel

\sigma

-field

\mathcal^1

\mathbb

), every conditional probability distribution is regular. In this case,

= \int_^\infty x \, \mu_(d x, \cdot)

almost surely.

Relation to conditional expectation

For any event

A \in \mathcal

, define the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

: :

\mathbf_A (\omega) = \begin 1 \; &\text \omega \in A, \\ 0 \; &\text \omega \notin A, \end

which is a random variable. Note that the expectation of this random variable is equal to the probability of ''A'' itself: :

\operatorname(\mathbf_A) = \operatorname(A). \;

Given a

\sigma

-field

\mathcal \subseteq \mathcal

, the conditional probability

\operatorname(A\mid\mathcal)

is a version of the conditional expectation of the indicator function for

A

: :

\operatorname(A\mid\mathcal) = \operatorname(\mathbf_A\mid\mathcal) \;

An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation.

Interpretation of conditioning on a Sigma Field

Consider the probability space

(\Omega, \mathcal, \mathbb)

and a sub-sigma field

\mathcal \subset \mathcal

. The sub-sigma field

\mathcal

can be loosely interpreted as containing a subset of the information in

\mathcal

. For example, we might think of

\mathbb(B, \mathcal)

as the probability of the event

B

given the information in

\mathcal

. Also recall that an event

B

is independent of a sub-sigma field

\mathcal

\mathbb(B ,  A) = \mathbb(B)

for all

A \in \mathcal

. It is incorrect to conclude in general that the information in

\mathcal

does not tell us anything about the probability of event

B

occurring. This can be shown with a counter-example: Consider a probability space on the unit interval,

\Omega =

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

/math>. Let

\mathcal

be the sigma-field of all countable sets and sets whose complement is countable. So each set in

\mathcal

has measure

0

1

and so is independent of each event in

\mathcal

. However, notice that

\mathcal

also contains all the singleton events in

\mathcal

(those sets which contain only a single

\omega \in \Omega

). So knowing which of the events in

\mathcal

occurred is equivalent to knowing exactly which

\omega \in \Omega

occurred! So in one sense,

\mathcal

contains no information about

\mathcal

(it is independent of it), and in another sense it contains all the information in

\mathcal

References

Citations

Sources

* * * * {{Authority control Theory of probability distributions Conditional probability

Conditional discrete distributions

Example

Conditional continuous distributions

Example

Relation to independence

Properties

Measure-theoretic formulation

Relation to conditional expectation

Interpretation of conditioning on a Sigma Field

See also

References

Citations

Sources