HOME

TheInfoList



OR:

In
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, the binary entropy function, denoted \operatorname H(p) or \operatorname H_\text(p), is defined as the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of a
Bernoulli process In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. Th ...
with
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
p of one of two values. It is a special case of \Eta(X), the entropy function. Mathematically, the Bernoulli trial is modelled as a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
X that can take on only two values: 0 and 1, which are mutually exclusive and exhaustive. If \operatorname(X=1) = p, then \operatorname(X=0) = 1-p and the entropy of X (in
shannon Shannon may refer to: People * Shannon (given name) * Shannon (surname) * Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958) * Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum Wil ...
s) is given by :\operatorname H(X) = \operatorname H_\text(p) = -p \log_2 p - (1 - p) \log_2 (1 - p), where 0 \log_2 0 is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See ''
binary logarithm In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number , :x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n. For example, the binary logarithm of is , the ...
''. When p=\tfrac 1 2, the binary entropy function attains its maximum value. This is the case of an unbiased coin flip. \operatorname H(p) is distinguished from the entropy function \Eta(X) in that the former takes a single real number as a
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
whereas the latter takes a distribution or random variable as a parameter. Sometimes the binary entropy function is also written as \operatorname H_2(p). However, it is different from and should not be confused with the
Rényi entropy In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for th ...
, which is denoted as \Eta_2(X).


Explanation

In terms of information theory, ''entropy'' is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose p=0. At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If p=1, the result is again certain, so the entropy is 0 here as well. When p=1/2, the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if p=1/4, there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.


Derivative

The
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...
of the binary entropy function may be expressed as the negative of the
logit In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the i ...
function: : \operatorname H_\text(p) = - \operatorname_2(p) = -\log_2\left( \frac \right).


Taylor series

The
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...
of the binary entropy function in a neighborhood of 1/2 is :\operatorname H_\text(p) = 1 - \frac \sum^_ \frac for 0\le p\le 1.


Bounds

The following bounds hold for 0 < p < 1: :\ln(2) \cdot \log_2(p) \cdot \log_2(1-p) \leq H_\text(p) \leq \log_2(p) \cdot \log_2(1-p) and :4p(1-p) \leq H_\text(p) \leq (4p(1-p))^ where \ln denotes natural logarithm.


See also

*
Metric entropy In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special ca ...
*
Information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
*
Information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
*
Quantities of information Quantity or amount is a property that can exist as a multitude or magnitude, which illustrate discontinuity and continuity. Quantities can be compared in terms of "more", "less", or "equal", or by assigning a numerical value multiple of a uni ...


References


Further reading

* MacKay, David J. C.
Information Theory, Inference, and Learning Algorithms
' Cambridge: Cambridge University Press, 2003. {{ISBN, 0-521-64298-1 Entropy and information zh-yue:二元熵函數