In
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the
probability of a particular
event occurring from a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
. It can be thought of as an alternative way of expressing probability, much like
odds or
log-odds, but which has particular mathematical advantages in the setting of information theory.
The Shannon information can be interpreted as quantifying the level of "surprise" of a particular outcome. As it is such a basic quantity, it also appears in several other settings, such as the length of a message needed to transmit the event given an optimal
source coding
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
of the random variable.
The Shannon information is closely related to ''
entropy'', which is the expected value of the self-information of a random variable, quantifying how surprising the random variable is "on average". This is the average amount of self-information an observer would expect to gain about a random variable when measuring it.
The information content can be expressed in various
units of information, of which the most common is the "bit" (more correctly called the ''shannon''), as explained below.
Definition
Claude Shannon's definition of self-information was chosen to meet several axioms:
# An event with probability 100% is perfectly unsurprising and yields no information.
# The less probable an event is, the more surprising it is and the more information it yields.
# If two independent events are measured separately, the total amount of information is the sum of the self-informations of the individual events.
The detailed derivation is below, but it can be shown that there is a unique function of probability that meets these three axioms, up to a multiplicative scaling factor. Broadly, given a real number
and an
event with
probability , the information content is defined as follows:
The base ''b'' corresponds to the scaling factor above. Different choices of ''b'' correspond to different units of information: when , the unit is the
shannon (symbol Sh), often called a 'bit'; when , the unit is the
natural unit of information (symbol nat); and when , the unit is the
hartley
Hartley may refer to:
Places Australia
*Hartley, New South Wales
*Hartley, South Australia
**Electoral district of Hartley, a state electoral district
Canada
*Hartley Bay, British Columbia
United Kingdom
*Hartley, Cumbria
*Hartley, Plymou ...
(symbol Hart).
Formally, given a random variable
with
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
, the self-information of measuring
as
outcome
Outcome may refer to:
* Outcome (probability), the result of an experiment in probability theory
* Outcome (game theory), the result of players' decisions in game theory
* ''The Outcome'', a 2005 Spanish film
* An outcome measure (or endpoint) ...
is defined as
The use of the notation
for self-information above is not universal. Since the notation
is also often used for the related quantity of
mutual information, many authors use a lowercase
for self-entropy instead, mirroring the use of the capital
for the entropy.
Properties
Monotonically decreasing function of probability
For a given
probability space, the measurement of rarer
events are intuitively more "surprising", and yield more information content, than more common values. Thus, self-information is a
strictly decreasing monotonic function of the probability, or sometimes called an "antitonic" function.
While standard probabilities are represented by real numbers in the interval