HOME

TheInfoList



OR:

In
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, information dimension is an information measure for random vectors in
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean sp ...
, based on the normalized
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of finely quantized versions of the random vectors. This concept was first introduced by
Alfréd Rényi Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician known for his work in probability theory, though he also made contributions in combinatorics, graph theory, and number theory. Life Rényi was born in Budapest ...
in 1959. Simply speaking, it is a measure of the
fractal dimension In mathematics, more specifically in fractal geometry, a fractal dimension is a ratio providing a statistical index of complexity comparing how detail in a pattern (strictly speaking, a fractal pattern) changes with the scale at which it is meas ...
of a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
. It characterizes the growth rate of the
Shannon entropy Shannon may refer to: People * Shannon (given name) * Shannon (surname) * Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958) * Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum Will ...
given by successively finer discretizations of the space. In 2010, Wu and Verdú gave an operational characterization of Rényi information dimension as the fundamental limit of almost
lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statist ...
for analog sources under various regularity constraints of the encoder/decoder.


Definition and Properties

The entropy of a
discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
Z is :\mathbb_0(Z)=\sum_P_Z(z)\log_2\frac where P_Z(z) is the
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...
of Z when Z=z, and the supp(P_Z) denotes a set \. Let X be an arbitrary real-valued random variable. Given a
positive integer In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country"). Numbers used for counting are called '' cardinal ...
m, we create a new discrete random variable :\langle X\rangle_m=\frac where the \lfloor \cdot \rfloor is the floor operator which converts a
real number In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...
to the greatest integer less than it. Then :\underline(X)=\liminf_\frac and :\bar(X)=\limsup_\frac are called lower and upper information dimensions of X respectively. When \underline(X)=\bar(X), we call this value information dimension of X, :d(X)=\lim_\frac Some important properties of information dimension d(X): * If the mild condition \mathbb(\lfloor X\rfloor)<\infty is fulfilled, we have 0\leq\underline(X)\leq\bar(X)\leq1. * For an n-dimensional random vector \vec, the first property can be generalized to 0\leq\underline(\vec)\leq \bar(\vec)\leq n. * It is sufficient to calculate the upper and lower information dimensions when restricting to the
exponential Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value * Exp ...
subsequence m=2^l. * \underline(X) and \bar(X) are kept unchanged if rounding or ceiling functions are used in quantization.


''d''-Dimensional Entropy

If the information dimension d exists, one can define the d-dimensional entropy of this distribution by :\mathbb_(X)=\lim_(\mathbb_0(\langle X \rangle_n)-d(X)\log_2n) provided the limit exists. If d=0, the zero-dimensional entropy equals the standard
Shannon entropy Shannon may refer to: People * Shannon (given name) * Shannon (surname) * Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958) * Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum Will ...
\mathbb_0(X). For integer dimension d=n\ge 1, the n-dimensional entropy is the n-fold integral defining the respective
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
.


An equivalent definition of Information Dimension

In 1994, Kawabata and Dembo in proposed a new way of measuring information based on
rate distortion Rate or rates may refer to: Finance * Rates (tax), a type of taxation system in the United Kingdom used to fund local government * Exchange rate, rate at which one currency will be exchanged for another Mathematics and science * Rate (mathem ...
value of a random variable. The measure is defined as :d_R(X)=-2\frac, where R(X, D) is the rate-distortion function that is defined as :R(X, D) = \min_ I(X, \hat), or equivalently, minimum information that could lead to a D-close approximation of X. They further, proved that such definition is equivalent to the definition of information dimension. Formally, : d_R(X) = d(X).


Dimensional-Rate Bias

Using the above definition of R\'enyi information dimension, a similar measure to ''d''-dimensional entropy is defined in . This value b(X) that is named dimensional-rate bias is defined in a way to capture the finite term of rate-distortion function. Formally, :R(X, D)=-\frac\log \frac+b(X). The dimensional-rate bias is equal to ''d''-dimensional rate for
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
,
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory *Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a ...
, and discrete-continuous mixed distribution. Furthermore, it is calculable for a set of singular random variables, while ''d''-dimensional entropy does not necessarily exist there. Finally, dimensional-rate bias generalizes the Shannon's entropy and
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
, as one could find the mutual information I(X; Y) using the following formula: :I(X;Y) = b(X)+b(Y)-b(X, Y).


Discrete-Continuous Mixture Distributions

According to Lebesgue decomposition theorem, a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
can be uniquely represented by the mixture
v=pP_+qP_+rP_
where p+q+r=1 and p,q,r\geq0; P_ is a purely atomic probability measure (discrete part), P_ is the absolutely continuous probability measure, and P_ is a probability measure singular with respect to Lebesgue measure but with no atoms (singular part). Let X be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
such that \mathbb(\lfloor X \rfloor) < \infty. Assume the distribution of X can be represented as
v=(1-\rho)P_+\rho P_
where P_ is a discrete measure and P_ is the absolutely continuous probability measure with 0\leq\rho\leq1. Then
d(X)=\rho
Moreover, given \mathbb_0(P_) and differential entropy h(P_), the d-Dimensional Entropy is simply given by
\mathbb_\rho(X)=(1-\rho)\mathbb_0(P_)+\rho h(P_)+\mathbb_0(\rho)
where \mathbb_0(\rho) is the Shannon entropy of a discrete random variable Z with P_Z(1)=\rho and P_Z(0)=1-\rho and given by
\mathbb_0(\rho)=\rho\log_2\frac+(1-\rho)\log_2\frac


Example

Consider a signal which has a
Gaussian probability distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu is ...
. We pass the signal through a half-wave
rectifier A rectifier is an electrical device that converts alternating current (AC), which periodically reverses direction, to direct current (DC), which flows in only one direction. The reverse operation (converting DC to AC) is performed by an inve ...
which converts all negative value to 0, and maintains all other values. The half-wave rectifier can be characterized by the function
f(x)= \begin x,& \text x\geq 0\\ 0,&x<0 \end
Then, at the output of the rectifier, the signal has a
rectified Gaussian distribution In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0 (analogous to an electronic rectifier). It is essentially a mixture of a discrete distribution ( ...
. It is characterized by an atomic mass of weight 0.5 and has a Gaussian PDF for all x>0. With this mixture distribution, we apply the formula above and get the information dimension d of the distribution and calculate the d-dimensional entropy.
d(X)=\rho=0.5
The normalized right part of the zero-mean Gaussian distribution has entropy h(P_)=\frac\log_2(2\pi e\sigma^2)-1, hence
\begin \mathbb_(X)&=(1-0.5)(1\log_21)+0.5h(P_)+\mathbb_0(0.5)\\ &=0+\frac(\frac\log_2(2\pi e\sigma^2)-1)+1\\ &=\frac\log_2(2\pi e\sigma^2)+\frac\,\text \end


Connection to Differential Entropy

It is shown that information dimension and
differential entropy Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
are tightly connected. Let X be a random variable with continuous density f(x). Suppose we divide the range of X into bins of length \Delta . By the
mean value theorem In mathematics, the mean value theorem (or Lagrange theorem) states, roughly, that for a given planar arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant through its endpoints. It ...
, there exists a value x_i within each bin such that :f(x_i)\Delta=\int_^f(x)\;\mathrmx Consider the discretized random variable X^\Delta=x_i if i\Delta\leq X< (i+1)\Delta. The probability of each support point X^\Delta=x_i is :P_(x_i)=\int_^f(x)\;\mathrmx=f(x_i)\Delta Let S = \operatorname(P_). The entropy of X^\Delta is : \begin \mathbb_0(X^\Delta)&=-\sum_ P_\log_2P_\\ &=-\sum_ f(x_i)\Delta\log_2(f(x_i)\Delta)\\ &=-\sum_ \Delta f(x_i)\log_2f(x_i)-\sum_ f(x_i)\Delta \log_2\Delta\\ &=-\sum_ \Delta f(x_i)\log_2f(x_i)-\log_2\Delta\\ \end If we set \Delta=1/m and x_i=i/m then we are doing exactly the same quantization as the definition of information dimension. Since relabeling the events of a discrete random variable does not change its entropy, we have :\mathbb_0(X^)=\mathbb_0(\langle X\rangle_m). This yields :\mathbb_0(\langle X\rangle_m)=-\sum \frac f(x_i)\log_2f(x_i)+\log_2m and when m is sufficiently large, :-\sum \Delta f(x_i)\log_2f(x_i) \approx \int f(x)\log_2 \frac\mathrmx which is the differential entropy h(x) of the continuous random variable. In particular, if f(x) is Riemann integrable, then :h(X)=\lim_\mathbb_0(\langle X\rangle_m)-\log_2(m). Comparing this with the d-dimensional entropy shows that the differential entropy is exactly the one-dimensional entropy :h(X)=\mathbb_1(X). In fact, this can be generalized to higher dimensions. Rényi shows that, if \vec is a random vector in a n-dimensional Euclidean space \real^n with an absolutely continuous distribution with a probability density function f_(\vec) and finite entropy of the integer part (H_0(\langle \vec \rangle_m)<\infty), we have d(\vec)=n and :\mathbb_n(\vec)=\int\cdots\int f_(\vec)\log_2\frac\mathrm\vec, if the integral exists.


Lossless data compression

The information dimension of a distribution gives a theoretical upper bound on the compression rate, if one wants to compress a variable coming from this distribution. In the context of lossless data compression, we try to compress real number with less real number which both have infinite precision. The main objective of the lossless data compression is to find efficient representations for source realizations x^n\in \mathcal^n by y^n\in\mathcal^n. A (n,k)-code for \ is a pair of mappings: * encoder: f_n:\mathcal^n\rightarrow \mathcal^k which converts information from a source into symbols for communication or storage; * decoder: g_n:\mathcal^k\rightarrow\mathcal^n is the reverse process, converting code symbols back into a form that the recipient understands. The block error probability is \mathcal\. Define r(\epsilon) to be the
infimum In mathematics, the infimum (abbreviated inf; plural infima) of a subset S of a partially ordered set P is a greatest element in P that is less than or equal to each element of S, if such an element exists. Consequently, the term ''greatest ...
of r\geq0 such that there exists a sequence of (n,\lfloor rn\rfloor)-codes such that \mathcal\\leq\epsilon for all sufficiently large n. So r(\epsilon) basically gives the ratio between the code length and the source length, it shows how good a specific encoder decoder pair is. The fundamental limits in lossless source coding are as follows. Consider a continuous encoder function f(x):\real^n\rightarrow \real^ with its continuous decoder function g(x):\real^\rightarrow \real^n. If we impose no regularity on f(x) and g(x), due to the rich structure of \real, we have the minimum \epsilon-achievable rate R_0(\epsilon)=0 for all 0<\epsilon\leq1. It means that one can build an encoder-decoder pair with infinity compression rate. In order to get some nontrivial and meaningful conclusions, let R^*(\epsilon ) the minimum \epsilon-achievable rate for linear encoder and Borel decoder. If random variable X has a distribution which is a mixture of discrete and continuous part. Then R^*(\epsilon)=d(X) for all 0<\epsilon\leq1 Suppose we restrict the decoder to be a
Lipschitz continuous function In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there exi ...
and \bar(X)<\infty holds, then the minimum \epsilon-achievable rate R(\epsilon)\geq \bar(X) for all 0<\epsilon\leq1. The fundamental role of information dimension in lossless data compression further extends beyond the i.i.d. data. It is shown that for specified processes (e.g., moving-average processes) the ratio of lossless compression is also equal to the information dimension rate rate.See This result allows for further compression that was not possible by considering only marginal distribution of the process.


See also

*
Fractal dimension In mathematics, more specifically in fractal geometry, a fractal dimension is a ratio providing a statistical index of complexity comparing how detail in a pattern (strictly speaking, a fractal pattern) changes with the scale at which it is meas ...
*
Correlation dimension In chaos theory, the correlation dimension (denoted by ''ν'') is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. For example, if we have a set of random points on ...
*
Entropy (information theory) In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...


Notes


References

* * * * * *{{Cite journal , first1 = T. , last1 = Kawabata , first2 = A. , last2 = Dembo , title = The Rate-Distortion Dimension of Sets and Measures , journal =
IEEE Transactions on Information Theory ''IEEE Transactions on Information Theory'' is a monthly peer-reviewed scientific journal published by the IEEE Information Theory Society. It covers information theory and the mathematics of communications. It was established in 1953 as ''IRE Tra ...
, date = September 1994 , pages = 1–1 , doi = 10.1109/18.333868 , doi-access =free Information theory