statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, expected mean squares (EMS) are the expected values of certain statistics arising in partitions of sums of squares in the

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

(ANOVA). They can be used for ascertaining which statistic should appear in the denominator in an

F-test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...

for testing a

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

that a particular effect is absent.

Definition

When the total corrected sum of squares in an ANOVA is partitioned into several components, each attributed to the effect of a particular predictor variable, each of the sums of squares in that partition is a random variable that has an

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

. That expected value divided by the corresponding number of degrees of freedom is the expected

mean square In mathematics and its applications, the mean square is normally defined as the arithmetic mean of the squares of a set of numbers or of a random variable. It may also be defined as the arithmetic mean of the squares of the '' deviations'' betwee ...

for that predictor variable.

Example

The following example is from ''Longitudinal Data Analysis'' by Donald Hedeker and Robert D. Gibbons. Donald Hedeker, Robert D. Gibbons. ''Longitudinal Data Analysis.'' Wiley Interscience. 2006. pp. 21–24 Each of ''s'' treatments (one of which may be a placebo) is administered to a sample of (capital) ''N'' randomly chosen patients, on whom certain measurements

Y_

are observed at each of (lower-case) ''n'' specified times, for

h=1,\ldots,s, \quad i=1,\ldots,N_h

(thus the numbers of patients receiving different treatments may differ), and

j=1,\ldots, n.

We assume the sets of patients receiving different treatments are disjoint, so patients are

nested ''Nested'' is the seventh studio album by Bronx-born singer, songwriter, and pianist Laura Nyro. It was released in 1978 on Columbia Records. Following on from her extensive tour to promote 1976's ''Smile'', which resulted in the 1977 live albu ...

within treatments and not crossed with treatments. We have :

Y_ = \mu + \gamma_h + \tau_j + (\gamma\tau)_ + \pi_ + \varepsilon_

where *

\mu

= grand mean, (fixed) *

\gamma_h

= effect of treatment

h

, (fixed) *

\tau_j

= effect of time

j

, (fixed) *

(\gamma\tau)_

= interaction effect of treatment

h

and time

j

, (fixed) *

\pi_

= individual difference effect for patient

i

nested within treatment

h

, (random) *

\varepsilon_

= error for patient

i

in treatment

h

at time

j

. (random) *

\sigma_\pi^2

= variance of the random effect of patients nested within treatments, *

\sigma_\varepsilon

= error variance. The total corrected sum of squares is :

\sum_ (Y_ - \overline Y)^2 \quad\text \overline Y = \frac 1 n \sum_ Y_.

The ANOVA table below partitions the sum of squares (where

N = \sum_h N_h

): :

Use in F-tests

A null hypothesis of interest is that there is no difference between effects of different treatments—thus no difference among treatment means. This may be expressed by saying

D_\text=0,

(with the notation as used in the table above). Under this null hypothesis, the expected mean square for effects of treatments is

\sigma_\varepsilon^2 + n \sigma_\pi^2.

The numerator in the F-statistic for testing this hypothesis is the mean square due to differences among treatments, i.e. it is

\left. \text_\text\right/ \big( (N-s)(n-1) \big).

The reason is that the random variable below, although under the null hypothesis it has an

F-distribution In probability theory and statistics, the ''F''-distribution or ''F''-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribut ...

, is not observable—it is not a statistic—because its value depends on the unobservable parameters

\sigma_\pi^2

and

\sigma_\varepsilon^2.

\frac \ne \frac

Instead, one uses as the test statistic the following random variable that is not defined in terms of

\text_\text

: :

F = \frac = \frac

Notes and references

{{reflist Analysis of variance