probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete. A probability mass function differs from a continuous probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A continuous PDF must be integrated over an interval to yield a probability. The value of the random variable having the largest probability mass is called the mode.

Formal definition

Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function

p: \R \to,1 /math> defined by


for -\infin < x < \infin, where P is a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

p_X(x)

can also be simplified as

p(x)

. The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,

\sum_x p_X(x) = 1

and

p_X(x)\geq 0.

Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes

x

Measure theoretic formulation

A probability mass function of a discrete random variable

X

can be seen as a special case of two more general measure theoretic constructions: the distribution of

X

and the probability density function of

X

with respect to the counting measure. We make this more precise below. Suppose that

(A, \mathcal A, P)

is a probability space and that

(B, \mathcal B)

is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of

B

. In this setting, a random variable

X \colon A \to B

is discrete provided its image is countable. The pushforward measure

X_(P)

—called the distribution of

X

in this context—is a probability measure on

B

whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section)

f_X \colon B \to \mathbb R

since

f_X(b)=P( X^( b ))=P(X=b)

for each

b \in B

. Now suppose that

(B, \mathcal B, \mu)

is a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

equipped with the counting measure

\mu

. The probability density function

f

X

with respect to the counting measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure of

X

(with respect to the counting measure), so

f = d X_*P / d \mu

and

f

is a function from

B

to the non-negative reals. As a consequence, for any

b \in B

we have

P(X=b)=P( X^( b) ) = X_*(P)(b) = \int_ f d \mu = f(b),

demonstrating that

f

is in fact a probability mass function. When there is a natural order among the potential outcomes

x

, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete multivariate random variable) and to consider also values not in the

image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...

X

. That is,

f_X

may be defined for all

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

s and

f_X(x)=0

for all

x \notin X(S)

as shown in the figure. The image of

X

has a

countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...

subset on which the probability mass function

f_X(x)

is one. Consequently, the probability mass function is zero for all but a countable number of values of

x

. The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. If

X

is a discrete random variable, then

P(X = x) = 1

means that the casual event

(X = x)

is certain (it is true in 100% of the occurrences); on the contrary,

P(X = x) = 0

means that the casual event

(X = x)

is always impossible. This statement isn't true for a continuous random variable

X

, for which

P(X = x) = 0

for any possible

x

Discretization In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numeri ...

is the process of converting a continuous random variable into a discrete one.

Examples

Finite

There are three major distributions associated, the Bernoulli distribution, the

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

and the

geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...

. *Bernoulli distribution: ber(p) , is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0.

p_X(x) = \begin
p, & \textx\text \\
1-p, & \textx\text
\end

An example of the Bernoulli distribution is tossing a coin. Suppose that

S

is the sample space of all outcomes of a single toss of a fair coin, and

X

is the random variable defined on

S

assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is

p_X(x) = \begin
\frac, &x = 0,\\
\frac, &x = 1,\\
0, &x \notin \.
\end

* Binomial distribution, models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is

\binom p^k (1-p)^

. An example of the binomial distribution is the probability of getting exactly one 6 when someone rolls a fair die three times. * Geometric distribution describes the number of trials needed to get one success. Its probability mass function is

p_X(k) = (1-p)^ p

.An example is tossing a coin until the first "heads" appears.

p

denotes the probability of the outcome "heads", and

k

denotes the number of necessary coin tosses. Other distributions that can be modeled using a probability mass function are the categorical distribution (also known as the generalized Bernoulli distribution) and the multinomial distribution. * If the discrete distribution has two or more categories one of which may occur, whether or not these categories have a natural ordering, when there is only a single trial (draw) this is a categorical distribution. * An example of a multivariate discrete distribution, and of its probability mass function, is provided by the multinomial distribution. Here the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories.

Infinite

The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers:

\text(X=i)= \frac\qquad \text i=1, 2, 3, \dots

Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.

Multivariate case

Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.