In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
of a particular
event occurring from a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
. It can be thought of as an alternative way of expressing probability, much like
odds
In probability theory, odds provide a measure of the probability of a particular outcome. Odds are commonly used in gambling and statistics. For example for an event that is 40% probable, one could say that the odds are or
When gambling, o ...
or
log-odds, but which has particular mathematical advantages in the setting of information theory.
The Shannon information can be interpreted as quantifying the level of "surprise" of a particular outcome. As it is such a basic quantity, it also appears in several other settings, such as the length of a message needed to transmit the event given an optimal
source coding of the random variable.
The Shannon information is closely related to ''
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
'', which is the expected value of the self-information of a random variable, quantifying how surprising the random variable is "on average". This is the average amount of self-information an observer would expect to gain about a random variable when measuring it.
The information content can be expressed in various
units of information
A unit of information is any unit of measure of digital data size. In digital computing, a unit of information is used to describe the capacity of a digital data storage device. In telecommunications, a unit of information is used to describe ...
, of which the most common is the "bit" (more formally called the ''shannon''), as explained below.
The term 'perplexity' has been used in language modelling to quantify the uncertainty inherent in a set of prospective events.
Definition
Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...
's definition of self-information was chosen to meet several axioms:
# An event with probability 100% is perfectly unsurprising and yields no information.
# The less probable an event is, the more surprising it is and the more information it yields.
# If two independent events are measured separately, the total amount of information is the sum of the self-informations of the individual events.
The detailed derivation is below, but it can be shown that there is a unique function of probability that meets these three axioms, up to a multiplicative scaling factor. Broadly, given a real number
and an
event with
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
, the information content is defined as follows:
The base ''b'' corresponds to the scaling factor above. Different choices of ''b'' correspond to different units of information: when , the unit is the
shannon (symbol Sh), often called a 'bit'; when , the unit is the
natural unit of information (symbol nat); and when , the unit is the
hartley (symbol Hart).
Formally, given a discrete random variable
with
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
, the self-information of measuring
as
outcome is defined as
The use of the notation
for self-information above is not universal. Since the notation
is also often used for the related quantity of
mutual information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...
, many authors use a lowercase
for self-entropy instead, mirroring the use of the capital
for the entropy.
Properties
Monotonically decreasing function of probability
For a given
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...
, the measurement of rarer
events are intuitively more "surprising", and yield more information content than more "common" events. Thus, self-information is a
strictly decreasing monotonic function of the probability, or sometimes called an "antitonic" function.
While standard probabilities are represented by real numbers in the interval