An F-test is a

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...

that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...

, represented by the random variable F, and checks if it follows an

F-distribution In probability theory and statistics, the ''F''-distribution or ''F''-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribut ...

. This check is valid if the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

is true and standard assumptions about the errors (ε) in the data hold. F-tests are frequently used to compare different statistical models and find the one that best describes the

population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...

the data came from. When models are created using the

least squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...

method, the resulting F-tests are often called "exact" F-tests. The F-statistic was developed by

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...

in the 1920s as the variance ratio and was later named in his honor by George W. Snedecor.

Common examples

Common examples of the use of ''F''-tests include the study of the following cases * One-way ANOVA Table generated using Matlab

The hypothesis that the

means Means may refer to: * Means LLC, an anti-capitalist media worker cooperative * Means (band), a Christian hardcore band from Regina, Saskatchewan * Means, Kentucky, a town in the US * Means (surname) * Means Johnston Jr. (1916–1989), US Navy ...

of a given set of

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

populations, all having the same

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

, are equal. This is perhaps the best-known ''F''-test, and plays an important role in the

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

(ANOVA). ** F test of

(ANOVA) follows three assumptions **# Normality (statistics) **# Homogeneity of variance **# Independence of errors and

random sampling In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the who ...

* The hypothesis that a proposed regression model fits the

data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...

well. See Lack-of-fit sum of squares. * The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear models that are nested within each other. * Multiple-comparison testing is conducted using needed data in already completed F-test, if F-test leads to rejection of null hypothesis and the factor under study has an impact on the dependent variable. ** "''a priori'' comparisons"/ "planned comparisons"- a particular set of comparisons ** "pairwise comparisons"-all possible comparisons *** i.e. Fisher's least significant difference (LSD) test, Tukey's honestly significant difference (HSD) test, Newman Keuls test, Ducan's test ** " ''a posteriori'' comparisons"/ " ''post hoc'' comparisons"/ " exploratory comparisons"- choose comparisons after examining the data *** i.e. Scheffé's method

''F''-test of the equality of two variances

The ''F''-test is sensitive to non-normality. In the

(ANOVA), alternative tests include

Levene's test In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. This test is used because some common statistical procedures assume that variances of the population ...

, Bartlett's test, and the Brown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption of

homoscedasticity In statistics, a sequence of random variables is homoscedastic () if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as hete ...

(''i.e.'' homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wise

Type I error Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...

rate.

Formula and calculation

Most ''F''-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares. The

test statistic Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...

in an ''F''-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow the ''F''-distribution under the null hypothesis, the sums of squares should be

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...

, and each should follow a scaled χ²-distribution. The latter condition is guaranteed if the data values are independent and

with a common

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

One-way analysis of variance

The formula for the one-way ANOVA ''F''-test

is :

F = \frac ,

or :

F = \frac.

The "explained variance", or "between-group variability" is :

\sum_^ n_i(\bar_ - \bar)^2/(K-1)

where

\bar_

denotes the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

in the ''i''-th group,

n_i

is the number of observations in the ''i''-th group,

\bar

denotes the overall mean of the data, and

K

denotes the number of groups. The "unexplained variance", or "within-group variability" is :

\sum_^\sum_^ \left( Y_-\bar_ \right)^2/(N-K),

where

Y_

is the ''j''^th observation in the ''i''^th out of

K

groups and

N

is the overall sample size. This ''F''-statistic follows the ''F''-distribution with degrees of freedom

d_1=K-1

and

d_2=N-K

under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value. The result of the F test can be determined by comparing calculated F value and critical F value with specific significance level (e.g. 5%). The F table serves as a reference guide containing critical F values for the distribution of the F-statistic under the assumption of a true null hypothesis. It is designed to help determine the threshold beyond which the F statistic is expected to exceed a controlled percentage of the time (e.g., 5%) when the null hypothesis is accurate. To locate the critical F value in the F table, one needs to utilize the respective degrees of freedom. This involves identifying the appropriate row and column in the F table that corresponds to the significance level being tested (e.g., 5%). How to use critical F values: If the F statistic < the critical F value * Fail to reject null hypothesis * Reject alternative hypothesis * There is no significant differences among sample averages * The observed differences among sample averages could be reasonably caused by random chance itself * The result is not statistically significant If the F statistic > the critical F value * Accept alternative hypothesis * Reject null hypothesis * There is significant differences among sample averages * The observed differences among sample averages could not be reasonably caused by random chance itself * The result is statistically significant Note that when there are only two groups for the one-way ANOVA ''F''-test,

F = t^

where ''t'' is the Student's

t

statistic.

Advantages

* Multi-group comparison efficiency: facilitating simultaneous comparison of multiple groups, enhancing efficiency particularly in situations involving more than two groups. * Clarity in variance comparison: offering a straightforward interpretation of variance differences among groups, contributing to a clear understanding of the observed data patterns. * Versatility across disciplines: demonstrating broad applicability across diverse fields, including social sciences, natural sciences, and engineering.

Disadvantages

* Sensitivity to assumptions: the F-test is highly sensitive to certain assumptions, such as homogeneity of variance and normality which can affect the accuracy of test results. * Limited scope to group comparisons: the F-test is tailored for comparing variances between groups, making it less suitable for analyses beyond this specific scope. * Interpretation challenges: the F-test does not pinpoint specific group pairs with distinct variances. Careful interpretation is necessary, and additional post hoc tests are often essential for a more detailed understanding of group-wise differences.

Multiple-comparison ANOVA problems

The ''F''-test in one-way analysis of variance (

ANOVA Analysis of variance (ANOVA) is a family of statistical methods used to compare the means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variation ''w ...

) is used to assess whether the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

s of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA ''F''-test can be used to assess whether any of the treatments are on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA ''F''-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making

multiple comparisons Multiple comparisons, multiplicity or multiple testing problem occurs in statistics when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values. The larger the numbe ...

. The disadvantage of the ANOVA ''F''-test is that if we reject the

, we do not know which treatments can be said to be significantly different from the others, nor, if the ''F''-test is performed at level α, can we state that the treatment pair with the greatest mean difference is significantly different at level α.

Regression problems

Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the restricted model, and model 2 is the unrestricted one. That is, model 1 has ''p''₁ parameters, and model 2 has ''p''₂ parameters, where ''p''₁ < ''p''₂, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2. One common context in this regard is that of deciding whether a model fits the data significantly better than does a naive model, in which the only explanatory term is the intercept term, so that all predicted values for the dependent variable are set equal to that variable's sample mean. The naive model is the restricted model, since the coefficients of all potential explanatory variables are restricted to equal zero. Another common context is deciding whether there is a structural break in the data: here the restricted model uses all data in one regression, while the unrestricted model uses separate regressions for two different subsets of the data. This use of the F-test is known as the

Chow test The Chow test (), proposed by econometrician Gregory Chow in 1960, is a statistical test of whether the true coefficients in two linear regressions on different data sets are equal. In econometrics, it is most commonly used in time series analysis ...

. The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives a ''significantly'' better fit to the data. One approach to this problem is to use an ''F''-test. If there are ''n'' data points to estimate parameters of both models from, then one can calculate the ''F'' statistic, given by :

F=\frac = \frac \cdot \frac,

where RSS_''i'' is the residual sum of squares of model ''i''. If the regression model has been calculated with weights, then replace RSS_''i'' with χ², the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, ''F'' will have an ''F'' distribution, with (''p''₂−''p''₁, ''n''−''p''₂)

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

. The null hypothesis is rejected if the ''F'' calculated from the data is greater than the critical value of the ''F''-distribution for some desired false-rejection probability (e.g. 0.05). Since ''F'' is a monotone function of the likelihood ratio statistic, the ''F''-test is a likelihood ratio test.

References

External links

Table of ''F''-test critical values

* by Mark Thoma {{Statistics, inference Analysis of variance Statistical ratios Statistical tests