Conditional Mutual Information
In probability theory, particularly information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third. Definition For random variables X, Y, and Z with support sets \mathcal, \mathcal and \mathcal, we define the conditional mutual information as This may be written in terms of the expectation operator: I(X;Y, Z) = \mathbb_Z P_ \otimes P_ )/math>. Thus I(X;Y, Z) is the expected (with respect to Z) Kullback–Leibler divergence from the conditional joint distribution P_ to the product of the conditional marginals P_ and P_. Compare with the definition of mutual information. In terms of PMFs for discrete distributions For discrete random variables X, Y, and Z with support sets \mathcal, \mathcal and \mathcal, the conditional mutual information I(X;Y, Z) is as follows : I(X;Y, Z) = \sum_ p_Z(z) \sum_ \sum_ p_(x,y, z) \log \frac where the marginal, joint, a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Support (measure Theory)
In mathematics, the support (sometimes topological support or spectrum) of a measure ''μ'' on a measurable topological space (''X'', Borel(''X'')) is a precise notion of where in the space ''X'' the measure "lives". It is defined to be the largest ( closed) subset of ''X'' for which every open neighbourhood of every point of the set has positive measure. Motivation A (non-negative) measure \mu on a measurable space (X, \Sigma) is really a function \mu : \Sigma \to , +\infty. Therefore, in terms of the usual definition of support, the support of \mu is a subset of the σ-algebra \Sigma : :\operatorname (\mu) := \overline, where the overbar denotes set closure. However, this definition is somewhat unsatisfactory: we use the notion of closure, but we do not even have a topology on \Sigma . What we really want to know is where in the space X the measure \mu is non-zero. Consider two examples: # Lebesgue measure \lambda on the real line \mathbb . It se ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Data Processing Inequality
The data processing inequality is an information theoretic concept which states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'. Definition Let three random variables form the Markov chain X \rightarrow Y \rightarrow Z, implying that the conditional distribution of Z depends only on Y and is conditionally independent of X. Specifically, we have such a Markov chain if the joint probability mass function can be written as :p(x,y,z) = p(x)p(y, x)p(z, y)=p(y)p(x, y)p(z, y) In this setting, no processing of Y, deterministic or random, can increase the information that Y contains about X. Using the mutual information, this can be written as : : I(X;Y) \geqslant I(X;Z) With the equality I(X;Y) = I(X;Z) if and only if I(X;Y\mid Z)=0 , i.e. Z and Y contain the same information about X, and X \rightarrow Z \rightarrow Y also forms a Markov chain. Proof One can ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Interaction Information
The interaction information is a generalization of the mutual information for more than two variables. There are many names for interaction information, including ''amount of information'', ''information correlation'', ''co-information'', and simply ''mutual information''. Interaction information expresses the amount of information (redundancy or synergy) bound up in a set of variables, ''beyond'' that which is present in any subset of those variables. Unlike the mutual information, the interaction information can be either positive or negative. These functions, their negativity and minima have a direct interpretation in algebraic topology. Definition The conditional mutual information can be used to inductively define the interaction information for any finite number of variables as follows: :I(X_1;\ldots;X_) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n\mid X_), where :I(X_1;\ldots;X_n \mid X_) = \mathbb E_\big(I(X_1;\ldots;X_n) \mid X_\big). Some authors define the interaction inf ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Inequalities In Information Theory
Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear. Entropic inequalities Consider a tuple X_1,X_2,\dots,X_n of n finitely (or at most countably) supported random variables on the same probability space. There are 2''n'' subsets, for which (joint) entropies can be computed. For example, when ''n'' = 2, we may consider the entropies H(X_1), H(X_2), and H(X_1, X_2). They satisfy the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables): * H(X_1) \ge 0 * H(X_2) \ge 0 * H(X_1) \le H(X_1, X_2) * H(X_2) \le H(X_1, X_2) * H(X_1, X_2) \le H(X_1) + H(X_2). In fact, these can all be expressed as special cases of a single inequality involving the conditional mutual information, namely :I(A;B, C) \ge 0, where A, B, and C each denote the joint distribution of some arbitrary (possibly empty) subset of our collection of ra ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Joint Entropy
In information theory, joint entropy is a measure of the uncertainty associated with a set of variables. Definition The joint Shannon entropy (in bits) of two discrete random variables X and Y with images \mathcal X and \mathcal Y is defined as where x and y are particular values of X and Y, respectively, P(x,y) is the joint probability of these values occurring together, and P(x,y) \log_2 (x,y)/math> is defined to be 0 if P(x,y)=0. For more than two random variables X_1, ..., X_n this expands to where x_1,...,x_n are particular values of X_1,...,X_n, respectively, P(x_1, ..., x_n) is the probability of these values occurring together, and P(x_1, ..., x_n) \log_2 (x_1, ..., x_n)/math> is defined to be 0 if P(x_1, ..., x_n)=0. Properties Nonnegativity The joint entropy of a set of random variables is a nonnegative number. :\Eta(X,Y) \geq 0 :\Eta(X_1,\ldots, X_n) \geq 0 Greater than individual entropies The joint entropy of a set of variables is greater than or equ ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Lebesgue Integration
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Lebesgue, extends the integral to a larger class of functions. It also extends the domains on which these functions can be defined. Long before the 20th century, mathematicians already understood that for non-negative functions with a smooth enough graph—such as continuous functions on closed bounded intervals—the ''area under the curve'' could be defined as the integral, and computed using approximation techniques on the region by polygons. However, as the need to consider more irregular functions arose—e.g., as a result of the limiting processes of mathematical analysis and the mathematical theory of probability—it became clear that more careful approximation techniques were needed to define a suitable integral. Also, one mi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Subset
In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ''B''. The relationship of one set being a subset of another is called inclusion (or sometimes containment). ''A'' is a subset of ''B'' may also be expressed as ''B'' includes (or contains) ''A'' or ''A'' is included (or contained) in ''B''. A ''k''-subset is a subset with ''k'' elements. The subset relation defines a partial order on sets. In fact, the subsets of a given set form a Boolean algebra under the subset relation, in which the join and meet are given by intersection and union, and the subset relation itself is the Boolean inclusion relation. Definition If ''A'' and ''B'' are sets and every element of ''A'' is also an element of ''B'', then: :*''A'' is a subset of ''B'', denoted by A \subseteq B, or equivalently, :* ''B'' ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Disintegration Theorem
In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure. Motivation Consider the unit square in the Euclidean plane R2, . Consider the probability measure μ defined on ''S'' by the restriction of two-dimensional Lebesgue measure λ2 to ''S''. That is, the probability of an event ''E'' ⊆ ''S'' is simply the area of ''E''. We assume ''E'' is a measurable subset of ''S''. Consider a one-dimensional subset of ''S'' such as the line segment ''L''''x'' = × , 1 ''L''''x'' has μ-measure zero; every subset of ''L''''x'' is a μ-null set; since the Lebesgue measure space is a complete measure space, E \subseteq L_ \implies \mu (E) = 0. While true, t ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Product Topology
In topology and related areas of mathematics, a product space is the Cartesian product of a family of topological spaces equipped with a natural topology called the product topology. This topology differs from another, perhaps more natural-seeming, topology called the box topology, which can also be given to a product space and which agrees with the product topology when the product is over only finitely many spaces. However, the product topology is "correct" in that it makes the product space a categorical product of its factors, whereas the box topology is too fine; in that sense the product topology is the natural topology on the Cartesian product. Definition Throughout, I will be some non-empty index set and for every index i \in I, let X_i be a topological space. Denote the Cartesian product of the sets X_i by X := \prod X_ := \prod_ X_i and for every index i \in I, denote the i-th by \begin p_i :\;&& \prod_ X_j &&\;\to\; & X_i \\ .3ex && \left(x_j ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Conditional Probability Distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value x of X as a parameter. When both X and Y are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable. If the conditional distribution of Y given X is a continuous distribution, then its probability density function is known as the conditional density function. The properties of a conditional distribution, such as the moments, are often referred to by corresponding names such as the conditional mean and conditional varianc ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |