statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, one-way analysis of variance (or one-way ANOVA) is a technique to compare whether two or more samples' means are significantly different (using the

F distribution In probability theory and statistics, the ''F''-distribution or ''F''-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribut ...

). This

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

technique requires a numeric

response Response may refer to: *Call and response (music), musical structure *Reaction (disambiguation) *Request–response **Output or response, the result of telecommunications input *Response (liturgy), a line answering a versicle * Response (music) o ...

variable "Y" and a single explanatory variable "X", hence "one-way". The ANOVA tests the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

, which states that samples in all groups are drawn from populations with the same mean values. To do this, two estimates are made of the population variance. These estimates rely on various assumptions ( see below). The ANOVA produces an F-statistic, the ratio of the variance calculated among the means to the variance within the samples. If the group means are drawn from populations with the same mean values, the variance between the group means should be lower than the variance of the samples, following the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

. A higher ratio therefore implies that the samples were drawn from populations with different mean values. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a

t-test Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is Statistical significance, statistically significant or not. It is any statistical hypothesis testing, statistical hypothesis test ...

(Gosset, 1908). When there are only two means to compare, the

and the

F-test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...

are equivalent; the relation between ANOVA and ''t'' is given by ''F'' = ''t''². An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two different categorical independent variables on one dependent variable.

Assumptions

The results of a one-way ANOVA can be considered reliable as long as the following assumptions are met: * Response variable residuals are

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

(or approximately normally distributed). * Variances of populations are equal. * Responses for a given group are

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

normal random variables (not a

simple random sample In statistics, a simple random sample (or SRS) is a subset of individuals (a sample) chosen from a larger set (a population) in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sa ...

(SRS)). If data are ordinal, a non-parametric alternative to this test should be used such as Kruskal–Wallis one-way analysis of variance. If the variances are not known to be equal, a generalization of 2-sample Welch's t-test can be used.

Departures from population normality

ANOVA is a relatively robust procedure with respect to violations of the normality assumption. The one-way ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis of covariance. It is often stated in popular literature that none of these ''F''-tests are robust when there are severe violations of the assumption that each population follows the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

, particularly for small alpha levels and unbalanced layouts. Furthermore, it is also claimed that if the underlying assumption of

homoscedasticity In statistics, a sequence of random variables is homoscedastic () if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as hete ...

is violated, the

Type I error Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...

properties degenerate much more severely. However, this is a misconception, based on work done in the 1950s and earlier. The first comprehensive investigation of the issue by Monte Carlo simulation was Donaldson (1966). He showed that under the usual departures (positive skew, unequal variances) "the ''F''-test is conservative", and so it is less likely than it should be to find that a variable is significant. However, as either the sample size or the number of cells increases, "the power curves seem to converge to that based on the normal distribution". Tiku (1971) found that "the non-normal theory power of ''F'' is found to differ from the normal theory power by a correction term which decreases sharply with increasing sample size." The problem of non-normality, especially in large samples, is far less serious than popular articles would suggest. The current view is that "Monte-Carlo studies were used extensively with normal distribution-based tests to determine how sensitive they are to violations of the assumption of normal distribution of the analyzed variables in the population. The general conclusion from these studies is that the consequences of such violations are less severe than previously thought. Although these conclusions should not entirely discourage anyone from being concerned about the normality assumption, they have increased the overall popularity of the distribution-dependent statistical tests in all areas of research." For nonparametric alternatives in the factorial layout, see Sawilowsky. For more discussion see ANOVA on ranks.

The case of fixed effects, fully randomized experiment, unbalanced data

The model

The normal linear model describes treatment groups with probability distributions which are identically bell-shaped (normal) curves with different means. Thus fitting the models requires only the means of each treatment group and a variance calculation (an average variance within the treatment groups is used). Calculations of the means and the variance are performed as part of the hypothesis test. The commonly used normal linear models for a completely randomized experiment are: :

y_=\mu_j+\varepsilon_

(the means model) or :

y_=\mu+\tau_j+\varepsilon_

(the effects model) where :

i=1,\dotsc,I

is an index over experimental units :

j=1,\dotsc,J

is an index over treatment groups :

I_j

is the number of experimental units in the jth treatment group :

I = \sum_j I_j

is the total number of experimental units :

y_

are observations :

\mu_j

is the mean of the observations for the jth treatment group :

\mu

is the grand mean of the observations :

\tau_j

is the jth treatment effect, a deviation from the grand mean :

\sum\tau_j=0

\mu_j=\mu+\tau_j

\varepsilon \thicksim N(0, \sigma^2)

\varepsilon_

are normally distributed zero-mean random errors. The index

i

over the experimental units can be interpreted several ways. In some experiments, the same experimental unit is subject to a range of treatments;

i

may point to a particular unit. In others, each treatment group has a distinct set of experimental units;

i

may simply be an index into the

j

-th list.

The data and statistical summaries of the data

One form of organizing experimental observations

y_

is with groups in columns: Comparing model to summaries:

\mu = m

and

\mu_j = m_j

. The grand mean and grand variance are computed from the grand sums, not from group means and variances.

The hypothesis test

Given the summary statistics, the calculations of the hypothesis test are shown in tabular form. While two columns of SS are shown for their explanatory value, only one column is required to display results.

MS_

is the estimate of variance corresponding to

\sigma^2

of the model.

Analysis summary

The core ANOVA analysis consists of a series of calculations. The data is collected in tabular form. Then * Each treatment group is summarized by the number of experimental units, two sums, a mean and a variance. The treatment group summaries are combined to provide totals for the number of units and the sums. The grand mean and grand variance are computed from the grand sums. The treatment and grand means are used in the model. * The three DFs and SSs are calculated from the summaries. Then the MSs are calculated and a ratio determines F. * A computer typically determines a p-value from F which determines whether treatments produce significantly different results. If the result is significant, then the model provisionally has validity. If the experiment is balanced, all of the

I_j

terms are equal so the SS equations simplify. In a more complex experiment, where the experimental units (or environmental effects) are not homogeneous, row statistics are also used in the analysis. The model includes terms dependent on

i

. Determining the extra terms reduces the number of degrees of freedom available.

Example

Consider an experiment to study the effect of three different levels of a factor on a response (e.g. three levels of a fertilizer on plant growth). If we had 6 observations for each level, we could write the outcome of the experiment in a table like this, where ''a''₁, ''a''₂, and ''a''₃ are the three levels of the factor being studied. : The null hypothesis, denoted H₀, for the overall ''F''-test for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the ''F''-ratio: Step 1: Calculate the mean within each group: :

\begin
\overline_1 & = \frac\sum Y_ = \frac = 5 \\
\overline_2 & = \frac\sum Y_ = \frac = 9 \\
\overline_3 & = \frac\sum Y_ = \frac = 10
\end

Step 2: Calculate the overall mean: :

\overline = \frac = \frac = \frac = 8

: where ''a'' is the number of groups. Step 3: Calculate the "between-group" sum of squared differences: :

& = 6(5-8)^2 + 6(9-8)^2 + 6(10-8)^2 = 84 \end

where ''n'' is the number of data values per group. The between-group degrees of freedom is one less than the number of groups :

f_b = 3-1 = 2

so the between-group mean square value is :

MS_B = 84/2 = 42

Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group The within-group sum of squares is the sum of squares of all 18 values in this table :

\begin
S_W =& (1)^2 + (3)^2+ (-1)^2+(0)^2+(-2)^2+(-1)^2+ \\
&(-1)^2+(3)^2+(0)^2+(2)^2+(-3)^2+(-1)^2+ \\
&(3)^2+(-1)^2+(1)^2+(-2)^2+(-3)^2+(2)^2 \\
=&\  1 + 9 + 1 + 0 + 4 + 1 + 1 + 9 + 0 + 4 + 9 + 1 + 9 + 1 + 1 + 4 + 9 + 4\\
=&\  68 \\
\end

The within-group degrees of freedom is :

f_W = a(n-1) = 3(6-1) = 15

Thus the within-group mean square value is :

MS_W = S_W/f_W = 68/15 \approx 4.5

Step 5: The ''F''-ratio is :

F = \frac \approx 42/4.5 \approx 9.3

The critical value is the number that the test statistic must exceed to reject the test. In this case, ''F''_crit(2,15) = 3.68 at ''α'' = 0.05. Since ''F''=9.3 > 3.68, the results are significant at the 5% significance level. One would not accept the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The

p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...

for this test is 0.002. After performing the ''F''-test, it is common to carry out some "post-hoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The

standard error The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...

of each of these differences is

\sqrt = 1.2

. Thus the first group is strongly different from the other groups, as the mean difference is more than 3 times the standard error, so we can be highly confident that the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

of the first group differs from the population means of the other groups. However, there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error. Note ''F''(''x'', ''y'') denotes an ''F''-distribution cumulative distribution function with ''x'' degrees of freedom in the numerator and ''y'' degrees of freedom in the denominator.