A random variable (also called random quantity, aleatory variable, or stochastic variable) is a
mathematical formalization of a quantity or object which depends on
random
In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. ...
events.
The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical
function in which
* the
domain is the set of possible
outcomes in a
sample space (e.g. the set
which are the possible upper sides of a flipped coin heads
or tails
as the result from tossing a coin); and
* the
range is a
measurable space (e.g. corresponding to the domain above, the range might be the set
if say heads
mapped to -1 and
mapped to 1). Typically, the range of a random variable is a subset of the
real numbers
In mathematics, a real number is a number that can be used to measurement, measure a continuous variable, continuous one-dimensional quantity such as a time, duration or temperature. Here, ''continuous'' means that pairs of values can have arbi ...
.

Informally, randomness typically represents some fundamental element of chance, such as in the roll of a
die; it may also represent uncertainty, such as
measurement error.
However, the
interpretation of probability is philosophically complicated, and even in specific cases is not always straightforward. The purely mathematical analysis of random variables is independent of such interpretational difficulties, and can be based upon a rigorous
axiomatic setup.
In the formal mathematical language of
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...
, a random variable is defined as a
measurable function from a
probability measure space (called the ''sample space'') to a
measurable space. This allows consideration of the
pushforward measure, which is called the ''distribution'' of the random variable; the distribution is thus a
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...
on the set of all possible values of the random variable. It is possible for two random variables to have identical distributions but to differ in significant ways; for instance, they may be
independent.
It is common to consider the special cases of
discrete random variables and
absolutely continuous random variables, corresponding to whether a random variable is valued in a countable subset or in an interval of
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
s. There are other important possibilities, especially in the theory of
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...
es, wherein it is natural to consider
random sequences or
random functions. Sometimes a ''random variable'' is taken to be automatically valued in the real numbers, with more general random quantities instead being called ''
random elements''.
According to
George Mackey,
Pafnuty Chebyshev was the first person "to think systematically in terms of random variables".
Definition
A random variable
is a
measurable function from a sample space
as a set of possible
outcomes to a
measurable space . The technical axiomatic definition requires the sample space
to belong to a
probability triple (see the
measure-theoretic definition). A random variable is often denoted by capital
Roman letters such as
.
The probability that
takes on a value in a measurable set
is written as
:
.
Standard case
In many cases,
is
real-valued, i.e.
. In some contexts, the term
random element (see
extensions) is used to denote a random variable not of this form.
When the
image
An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...
(or range) of
is finite or
countably infinite, the random variable is called a discrete random variable
and its distribution is a
discrete probability distribution, i.e. can be described by a
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
that assigns a probability to each value in the image of
. If the image is uncountably infinite (usually an
interval) then
is called a continuous random variable. In the special case that it is
absolutely continuous, its distribution can be described by a
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable. Not all continuous random variables are absolutely continuous.
Any random variable can be described by its
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
, which describes the probability that the random variable will be less than or equal to a certain value.
Extensions
The term "random variable" in statistics is traditionally limited to the
real-valued case (
). In this case, the structure of the real numbers makes it possible to define quantities such as the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
and
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of a random variable, its
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
, and the
moments of its distribution.
However, the definition above is valid for any
measurable space of values. Thus one can consider random elements of other sets
, such as random
Boolean values,
categorical values,
complex numbers
In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form a ...
,
vectors,
matrices,
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s,
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
s,
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
s,
shape
A shape is a graphics, graphical representation of an object's form or its external boundary, outline, or external Surface (mathematics), surface. It is distinct from other object properties, such as color, Surface texture, texture, or material ...
s,
manifold
In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...
s, and
functions. One may then specifically refer to a ''random variable of
type '', or an ''
-valued random variable''.
This more general concept of a
random element is particularly useful in disciplines such as
graph theory
In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
,
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, and other fields in
discrete mathematics
Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
and
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, where one is often interested in modeling the random variation of non-numerical
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s. In some cases, it is nonetheless convenient to represent each element of
, using one or more real numbers. In this case, a random element may optionally be represented as a
vector of real-valued random variables (all defined on the same underlying probability space
, which allows the different random variables to
covary). For example:
*A random word may be represented as a random integer that serves as an index into the vocabulary of possible words. Alternatively, it can be represented as a random indicator vector, whose length equals the size of the vocabulary, where the only values of positive probability are
,
,
and the position of the 1 indicates the word.
*A random sentence of given length
may be represented as a vector of
random words.
*A
random graph on
given vertices may be represented as a
matrix of random variables, whose values specify the
adjacency matrix of the random graph.
*A
random function may be represented as a collection of random variables
, giving the function's values at the various points
in the function's domain. The
are ordinary real-valued random variables provided that the function is real-valued. For example, a
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...
is a random function of time, a
random vector
In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...
is a random function of some
index set such as
, and
random field is a random function on any set (typically time, space, or a discrete set).
Distribution functions
If a random variable
defined on the probability space
is given, we can ask questions like "How likely is it that the value of
is equal to 2?". This is the same as the probability of the event
which is often written as
or
for short.
Recording all these probabilities of outputs of a random variable
yields the
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of
. The probability distribution "forgets" about the particular probability space used to define
and only records the probabilities of various output values of
. Such a probability distribution, if
is real-valued, can always be captured by its
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
:
and sometimes also using a
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
,
. In
measure-theoretic terms, we use the random variable
to "push-forward" the measure
on
to a measure
on
. The measure
is called the "(probability) distribution of
" or the "law of
".
The density
, the
Radon–Nikodym derivative of
with respect to some reference measure
on
(often, this reference measure is the
Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...
in the case of continuous random variables, or the
counting measure in the case of discrete random variables).
The underlying probability space
is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as
correlation and dependence or
independence
Independence is a condition of a nation, country, or state, in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the status of ...
based on a
joint distribution of two or more random variables on the same probability space. In practice, one often disposes of the space
altogether and just puts a measure on
that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables. See the article on
quantile functions for fuller development.
Examples
Discrete random variable
Consider an experiment where a person is chosen at random. An example of a random variable may be the person's height. Mathematically, the random variable is interpreted as a function which maps the person to their height. Associated with the random variable is a probability distribution that allows the computation of the probability that the height is in any subset of possible values, such as the probability that the height is between 180 and 190 cm, or the probability that the height is either less than 150 or more than 200 cm.
Another random variable may be the person's number of children; this is a discrete random variable with non-negative integer values. It allows the computation of probabilities for individual integer values – the probability mass function (PMF) – or for sets of values, including infinite sets. For example, the event of interest may be "an even number of children". For both finite and infinite event sets, their probabilities can be found by adding up the PMFs of the elements; that is, the probability of an even number of children is the infinite sum
.
In examples such as these, the
sample space is often suppressed, since it is mathematically hard to describe, and the possible values of the random variables are then treated as a sample space. But when two random variables are measured on the same sample space of outcomes, such as the height and number of children being computed on the same random persons, it is easier to track their relationship if it is acknowledged that both height and number of children come from the same random person, for example so that questions of whether such random variables are correlated or not can be posed.
If
are countable sets of real numbers,
and
, then
is a discrete distribution function. Here
for
,
for
. Taking for instance an enumeration of all rational numbers as
, one gets a discrete function that is not necessarily a
step function (
piecewise constant).
Coin toss
The possible outcomes for one coin toss can be described by the sample space
. We can introduce a real-valued random variable
that models a $1 payoff for a successful bet on heads as follows:
If the coin is a
fair coin, ''Y'' has a
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
given by:
Dice roll
A random variable can also be used to describe the process of rolling dice and the possible outcomes. The most obvious representation for the two-dice case is to take the set of pairs of numbers ''n''
1 and ''n''
2 from (representing the numbers on the two dice) as the sample space. The total number rolled (the sum of the numbers in each pair) is then a random variable ''X'' given by the function that maps the pair to the sum:
and (if the dice are
fair
A fair (archaic: faire or fayre) is a gathering of people for a variety of entertainment or commercial activities. Fairs are typically temporary with scheduled times lasting from an afternoon to several weeks. Fairs showcase a wide range of go ...
) has a probability mass function ''f''
''X'' given by:
Continuous random variable
Formally, a continuous random variable is a random variable whose
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
is
continuous everywhere.
There are no "
gaps", which would correspond to numbers which have a finite probability of
occurring. Instead, continuous random variables
almost never take an exact prescribed value ''c'' (formally,
) but there is a positive probability that its value will lie in particular
intervals which can be
arbitrarily small. Continuous random variables usually admit
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
s (PDF), which characterize their CDF and
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...
s;
such distributions are also called
absolutely continuous; but some continuous distributions are
singular, or mixes of an absolutely continuous part and a singular part.
An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North, West, East, South, Southeast, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, ''X'' = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any ''range'' of values. For example, the probability of choosing a number in [0, 180] is . Instead of speaking of a probability mass function, we say that the probability ''density'' of ''X'' is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating the density over the given set.
More formally, given any
interval , a random variable