HOME

TheInfoList



OR:

The law of total variance is a fundamental result in
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
that expresses the variance of a random variable in terms of its conditional variances and conditional means given another random variable . Informally, it states that the overall variability of can be split into an “unexplained” component (the average of within-group variances) and an “explained” component (the variance of group means). Formally, if and are
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s on the same
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...
, and has finite
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
, then: \operatorname(Y) \;=\; \operatorname\bigl operatorname(Y \mid X)\bigr\;+\; \operatorname\!\bigl(\operatorname \mid Xbigr).\! This identity is also known as the variance decomposition formula, the conditional variance formula, the law of iterated variances, or colloquially as Eve’s law, in parallel to the “Adam’s law” naming for the
law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing property of conditional expectation, among other names, states that if X is a random ...
. In
actuarial science Actuarial science is the discipline that applies mathematics, mathematical and statistics, statistical methods to Risk assessment, assess risk in insurance, pension, finance, investment and other industries and professions. Actuary, Actuaries a ...
(particularly in credibility theory), the two terms \operatorname operatorname(Y \mid X)/math> and \operatorname(\operatorname \mid X are called the expected value of the process variance (EVPV) and the variance of the hypothetical means (VHM) respectively.


Explanation

Let be a random variable and another random variable on the same probability space. The law of total variance can be understood by noting: # \operatorname(Y \mid X) measures how much varies around its conditional mean \operatorname \mid X # Taking the expectation of this conditional variance across all values of gives \operatorname operatorname(Y \mid X)/math>, often termed the “unexplained” or within-group part. # The variance of the conditional mean, \operatorname(\operatorname \mid X, measures how much these conditional means differ (i.e. the “explained” or between-group part). Adding these components yields the total variance \operatorname(Y), mirroring how
analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
partitions variation.


Examples


Example 1 (Exam Scores)

Suppose five students take an exam scored 0–100. Let = student’s score and indicate whether the student is *international* or *domestic*: * Mean and variance for international: \operatorname \mid X=\text= 50,\; \operatorname(Y\mid X=\text) \approx 1266.7. * Mean and variance for domestic: \operatorname \mid X=\text= 50,\; \operatorname(Y\mid X=\text) = 100. Both groups share the same mean (50), so the explained variance \operatorname(\operatorname \mid X is 0, and the total variance equals the average of the within-group variances (weighted by group size), i.e. 800.


Example 2 (Mixture of Two Gaussians)

Let be a coin flip taking values with probability and with probability . Given Heads, ~ Normal(\mu_h,\sigma_h^2); given Tails, ~ Normal(\mu_t,\sigma_t^2). Then \operatorname operatorname(Y\mid X)= h\,\sigma_h^2 + (1 - h)\,\sigma_t^2, \operatorname(\operatorname \mid X = h\,(1 - h)\,(\mu_h - \mu_t)^2, so \operatorname(Y) = h\,\sigma_h^2 + (1 - h)\,\sigma_t^2 \;+\; h\,(1 - h)\,(\mu_h-\mu_t)^2.


Example 3 (Dice and Coins)

Consider a two-stage experiment: # Roll a fair die (values 1–6) to choose one of six biased coins. # Flip that chosen coin; let =1 if Heads, 0 if Tails. Then \operatorname \mid X=i= p_i, \; \operatorname(Y\mid X=i)=p_i(1-p_i). The overall variance of becomes \operatorname(Y) = \operatorname\bigl _X(1 - p_X)\bigr+ \operatorname\bigl(p_X\bigr), with p_X uniform on \.


Proof


Discrete/Finite Proof

Let (X_i,Y_i), i=1,\ldots,n, be observed pairs. Define \overline = \operatorname Then \operatorname(Y) = \frac\sum_^n \bigl(Y_i - \overline\bigr)^2 = \frac\sum_^n \Bigl Y_i - \overline_) + (\overline_ - \overline)\Bigr2, where \overline_=\operatorname \mid X=X_i Expanding the square and noting the cross term cancels in summation yields: \operatorname(Y) = \operatorname\bigl operatorname(Y\mid X)\bigr\;+\; \operatorname\!\bigl(\operatorname \mid Xbigr).\!


General Case

Using \operatorname(Y) = \operatorname ^2- \operatorname 2 and the
law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing property of conditional expectation, among other names, states that if X is a random ...
: \operatorname ^2= \operatorname\bigl operatorname(Y^2 \mid X)\bigr = \operatorname\bigl operatorname(Y\mid X) + \operatorname[Y\mid X2\bigr">\mid_X.html" ;"title="operatorname(Y\mid X) + \operatorname[Y\mid X">operatorname(Y\mid X) + \operatorname[Y\mid X2\bigr Subtract \operatorname 2 = \bigl(\operatorname[\operatorname(Y\mid X)]\bigr)^2 and regroup to arrive at \operatorname(Y) = \operatorname\bigl operatorname(Y\mid X)\bigr+ \operatorname\!\bigl(\operatorname \mid Xbigr).\!


Applications


Analysis of Variance (ANOVA)

In a one-way
analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
, the total sum of squares (proportional to \operatorname(Y)) is split into a “between-group” sum of squares (\operatorname(\operatorname \mid X) plus a “within-group” sum of squares (\operatorname operatorname(Y\mid X)/math>). The
F-test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...
examines whether the explained component is sufficiently large to indicate has a significant effect on .


Regression and R²

In
linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
and related models, if \hat=\operatorname \mid X the fraction of variance explained is R^2 = \frac = \frac = 1 - \frac. In the simple linear case (one predictor), R^2 also equals the square of the
Pearson correlation coefficient In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviatio ...
between and .


Machine Learning and Bayesian Inference

In many Bayesian and ensemble methods, one decomposes prediction uncertainty via the law of total variance. For a Bayesian neural network with random parameters \theta: \operatorname(Y)=\operatorname\bigl operatorname(Y\mid \theta)\bigr+ \operatorname\bigl(\operatorname \mid \thetabigr), often referred to as “aleatoric” (within-model) vs. “epistemic” (between-model) uncertainty.


Actuarial Science

Credibility theory uses the same partitioning: the expected value of process variance (EVPV), \operatorname operatorname(Y\mid X) and the variance of hypothetical means (VHM), \operatorname(\operatorname \mid X. The ratio of explained to total variance determines how much “credibility” to give to individual risk classifications.


Information Theory

For jointly Gaussian (X,Y), the fraction \operatorname(\operatorname \mid X/\operatorname(Y) relates directly to the
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...
I(Y;X).C. G. Bowsher & P. S. Swain (2012). "Identifying sources of variation and the flow of information in biochemical networks," ''PNAS'' 109 (20): E1320–E1328. In non-Gaussian settings, a high explained-variance ratio still indicates significant information about contained in .


Generalizations

The law of total variance generalizes to multiple or nested conditionings. For example, with two conditioning variables X_1 and X_2: \operatorname(Y) = \operatorname\bigl operatorname(Y\mid X_1,X_2)\bigr+ \operatorname\bigl operatorname(\operatorname[Y\mid X_1,X_2mid X_1)\bigr">\mid_X_1,X_2.html" ;"title="operatorname(\operatorname[Y\mid X_1,X_2">operatorname(\operatorname[Y\mid X_1,X_2mid X_1)\bigr+ \operatorname(\operatorname[Y\mid X_1]). More generally, the law of total cumulance extends this approach to higher moments.


See also

* Law of total expectation (Adam’s law) * Law of total covariance * Law of total cumulance *
Analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
*
Conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...
* R-squared * Fraction of variance unexplained * Variance decomposition


References

* * * * * {{cite journal , last1= Bowsher, first1= C.G. , last2= Swain, first2= P.S. , title=Identifying sources of variation and the flow of information in biochemical networks , journal=PNAS , volume=109 , issue=20 , pages=E1320–E1328 , year=2012 , doi=10.1073/pnas.1118365109 Algebra of random variables Statistical deviation and dispersion Articles containing proofs Theory of probability distributions Theorems in statistics Statistical laws