HOME

TheInfoList



OR:

Squared deviations from the mean (SDM) result from squaring deviations. In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and statistics, the definition of ''
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
'' is either the expected value of the SDM (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for ''
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
'' involve the partitioning of a sum of SDM.


Background

An understanding of the computations involved is greatly enhanced by a study of the statistical value : \operatorname( X ^ 2 ), where \operatorname is the expected value operator. For a random variable X with mean \mu and variance \sigma^2, : \sigma^2 = \operatorname( X ^ 2 ) - \mu^2.Mood & Graybill: ''An introduction to the Theory of Statistics'' (McGraw Hill) Therefore, : \operatorname( X ^ 2 ) = \sigma^2 + \mu^2. From the above, the following can be derived: : \operatorname\left( \sum\left( X ^ 2\right) \right) = n\sigma^2 + n\mu^2, : \operatorname\left( \left(\sum X \right)^ 2 \right) = n\sigma^2 + n^2\mu^2.


Sample variance

The sum of squared deviations needed to calculate
sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
(before deciding whether to divide by ''n'' or ''n'' − 1) is most easily calculated as : S = \sum x ^ 2 - \frac From the two derived expectations above the expected value of this sum is : \operatorname(S) = n\sigma^2 + n\mu^2 - \frac which implies : \operatorname(S) = (n - 1)\sigma^2. This effectively proves the use of the divisor ''n'' − 1 in the calculation of an unbiased sample estimate of ''σ''2.


Partition — analysis of variance

In the situation where data is available for ''k'' different treatment groups having size ''n''''i'' where ''i'' varies from 1 to ''k'', then it is assumed that the expected mean of each group is : \operatorname(\mu_i) = \mu + T_i and the variance of each treatment group is unchanged from the population variance \sigma^2. Under the Null Hypothesis that the treatments have no effect, then each of the T_i will be zero. It is now possible to calculate three sums of squares: ;Individual :I = \sum x^2 :\operatorname(I) = n\sigma^2 + n\mu^2 ;Treatments :T = \sum_^k \left(\left(\sum x\right)^2/n_i\right) :\operatorname(T) = k\sigma^2 + \sum_^k n_i(\mu + T_i)^2 :\operatorname(T) = k\sigma^2 + n\mu^2 + 2\mu \sum_^k (n_iT_i) + \sum_^k n_i(T_i)^2 Under the null hypothesis that the treatments cause no differences and all the T_i are zero, the expectation simplifies to :\operatorname(T) = k\sigma^2 + n\mu^2. ;Combination :C = \left(\sum x\right)^2/n :\operatorname(C) = \sigma^2 + n\mu^2


Sums of squared deviations

Under the null hypothesis, the difference of any pair of ''I'', ''T'', and ''C'' does not contain any dependency on \mu, only \sigma^2. :\operatorname(I - C) = (n - 1)\sigma^2 total squared deviations aka '' total sum of squares'' :\operatorname(T - C) = (k - 1)\sigma^2 treatment squared deviations aka ''
explained sum of squares In statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression (SSR – not to be confused with the residual sum of squares (RSS) or sum of squares of errors), is a quantity ...
'' :\operatorname(I - T) = (n - k)\sigma^2 residual squared deviations aka '' residual sum of squares'' The constants (''n'' − 1), (''k'' − 1), and (''n'' − ''k'') are normally referred to as the number of degrees of freedom.


Example

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6. :I = \frac + \frac + \frac + \frac + \frac = 66 :T = \frac + \frac = 12 + 50 = 62 :C = \frac = 256/5 = 51.2 Giving : Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom. : Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom. : Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.


Two-way analysis of variance

{{excerpt, Two-way analysis of variance


See also

*
Absolute deviation In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference (the deviation is posi ...
*
Algorithms for calculating variance Algorithms for calculating variance play a major role in computational statistics. A key difficulty in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instab ...
*
Errors and residuals In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The error ...
* Least squares *
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
* Residual sum of squares *
Variance decomposition In econometrics and other applications of multivariate time series analysis, a variance decomposition or forecast error variance decomposition (FEVD) is used to aid in the interpretation of a vector autoregression (VAR) model once it has been fitt ...


References

Statistical deviation and dispersion Analysis of variance