Given two

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

s that are defined on the same

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

, the joint probability distribution is the corresponding

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

s, i.e. the distributions of each of the individual random variables. It also encodes the

conditional probability distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...

s, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s). In the formal mathematical setup of measure theory, the joint distribution is given by the

pushforward measure In measure theory, a pushforward measure (also known as push forward, push-forward or image measure) is obtained by transferring ("pushing forward") a measure from one measurable space to another using a measurable function. Definition Given mea ...

, by the map obtained by pairing together the given random variables, of the sample space's

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

. In the case of real-valued random variables, the joint distribution, as a particular multivariate distribution, may be expressed by a multivariate

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

, or by a multivariate

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...

together with a multivariate

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

. In the special case of

continuous random variable In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

s, it is sufficient to consider probability density functions, and in the case of

discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s, it is sufficient to consider probability mass functions.

Examples

Draws from an urn

Suppose each of two urns contains twice as many red balls as blue balls, and no others, and suppose one ball is randomly selected from each urn, with the two draws independent of each other. Let

A

and

B

be discrete random variables associated with the outcomes of the draw from the first urn and second urn respectively. The probability of drawing a red ball from either of the urns is 2/3, and the probability of drawing a blue ball is 1/3. The joint probability distribution is presented in the following table: Each of the four inner cells shows the probability of a particular combination of results from the two draws; these probabilities are the joint distribution. In any one cell the probability of a particular combination occurring is (since the draws are independent) the product of the probability of the specified result for A and the probability of the specified result for B. The probabilities in these four cells sum to 1, as it is always true for probability distributions. Moreover, the final row and the final column give the

marginal probability distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...

for A and the marginal probability distribution for B respectively. For example, for A the first of these cells gives the sum of the probabilities for A being red, regardless of which possibility for B in the column above the cell occurs, as 2/3. Thus the marginal probability distribution for

A

gives

A

's probabilities ''unconditional'' on

B

, in a margin of the table.

Coin flips

Consider the flip of two

fair coin In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In th ...

s; let

A

and

B

be discrete random variables associated with the outcomes of the first and second coin flips respectively. Each coin flip is a

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

and has a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probab ...

. If a coin displays "heads" then the associated random variable takes the value 1, and it takes the value 0 otherwise. The probability of each of these outcomes is 1/2, so the marginal (unconditional) density functions are :

P(A)=1/2 \quad \text \quad A\in \;

P(B)=1/2 \quad \text \quad B\in \.

The joint probability mass function of

A

and

B

defines probabilities for each pair of outcomes. All possible outcomes are :

(A=0,B=0),
(A=0,B=1),
(A=1,B=0),
(A=1,B=1).

Since each outcome is equally likely the joint probability mass function becomes :

P(A,B)=1/4 \quad \text \quad A,B\in\.

Since the coin flips are independent, the joint probability mass function is the product of the marginals: :

P(A,B)=P(A)P(B) \quad \text \quad A,B \in\.

Rolling a dice

Consider the roll of a fair

dice Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing ...

and let

A=1

if the number is even (i.e. 2, 4, or 6) and

A=0

otherwise. Furthermore, let

B=1

if the number is prime (i.e. 2, 3, or 5) and

B=0

otherwise. Then, the joint distribution of

A

and

B

, expressed as a probability mass function, is :

\mathrm(A=0,B=0)=P\=\frac,\quad \quad \mathrm(A=1,B=0)=P\=\frac,

\mathrm(A=0,B=1)=P\=\frac,\quad \quad \mathrm(A=1,B=1)=P\=\frac.

These probabilities necessarily sum to 1, since the probability of ''some'' combination of

A

and

B

occurring is 1.

Marginal probability distribution

If more than one random variable is defined in a random experiment, it is important to distinguish between the joint probability distribution of X and Y and the probability distribution of each variable individually. The individual probability distribution of a random variable is referred to as its marginal probability distribution. In general, the marginal probability distribution of X can be determined from the joint probability distribution of X and other random variables. If the joint probability density function of random variable X and Y is

f_(x,y)

, the marginal probability density function of X and Y, which defines the

Marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

, is given by:

f_(x)= \int f_(x,y) \; dy

f_(y)= \int f_(x,y)  \; dx

where the first integral is over all points in the range of (X,Y) for which X=x and the second integral is over all points in the range of (X,Y) for which Y=y.

Joint cumulative distribution function

For a pair of random variables

X,Y

, the joint cumulative distribution function (CDF)

F_

is given by where the right-hand side represents the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

that the random variable

X

takes on a value less than or equal to

x

and that

Y

takes on a value less than or equal to

y

. For

N

random variables

X_1,\ldots,X_N

, the joint CDF

F_

is given by Interpreting the

N

random variables as a

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...

\mathbf = (X_1,\ldots,X_N)^T

yields a shorter notation: :

F_(\mathbf) = \operatorname(X_1 \leq x_1,\ldots,X_N \leq x_N)

Joint density function or mass function

Discrete case

The joint

of two

X, Y

is: or written in terms of conditional distributions :

p_(x,y) = \mathrm(Y=y \mid X=x) \cdot \mathrm(X=x) = \mathrm(X=x \mid Y=y) \cdot \mathrm(Y=y)

where

\mathrm(Y=y \mid X=x)

is the

Y = y

given that

X = x

. The generalization of the preceding two-variable case is the joint probability distribution of

n\,

discrete random variables

X_1, X_2, \dots,X_n

which is: or equivalently :

\begin
p_(x_1,\ldots,x_n) & =  \mathrm(X_1=x_1) \cdot \mathrm(X_2=x_2\mid X_1=x_1) \\ & \cdot \mathrm(X_3=x_3\mid X_1=x_1,X_2=x_2)  \\ &  \dots \\  & \cdot P(X_n=x_n\mid X_1=x_1,X_2=x_2,\dots,X_=x_).
\end

. This identity is known as the

chain rule of probability In probability theory, the chain rule (also called the general product rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. The rule is useful in the study of Bayes ...

. Since these are probabilities, in the two-variable case :

\sum_i \sum_j \mathrm(X=x_i\ \mathrm\ Y=y_j) = 1,\,

which generalizes for

n\,

discrete random variables

X_1, X_2, \dots , X_n

to :

\sum_ \sum_ \dots \sum_ \mathrm(X_1=x_,X_2=x_, \dots, X_n=x_) = 1.\;

Continuous case

The joint

f_(x,y)

for two

s is defined as the derivative of the joint cumulative distribution function (see ): This is equal to: :

f_(x,y) = f_(y\mid x)f_X(x) = f_(x\mid y)f_Y(y)

where

f_(y\mid x)

and

f_(x\mid y)

are the

conditional distribution In probability theory and statistics, given two jointly distribute