HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the correlation ratio is a measure of the curvilinear relationship between the
statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartil ...
within individual categories and the dispersion across the whole population or sample. The measure is defined as the ''ratio'' of two
standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
s representing these types of variation. The context here is the same as that of the
intraclass correlation coefficient In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly uni ...
, whose value is the square of the correlation ratio.


Definition

Suppose each observation is ''yxi'' where ''x'' indicates the category that observation is in and ''i'' is the label of the particular observation. Let ''nx'' be the number of observations in category ''x'' and :\overline_x=\frac and \overline=\frac, where \overline_x is the mean of the category ''x'' and \overline is the mean of the whole population. The correlation ratio η (
eta Eta ( ; uppercase , lowercase ; ''ē̂ta'' or ''ita'' ) is the seventh letter of the Greek alphabet, representing the close front unrounded vowel, . Originally denoting the voiceless glottal fricative, , in most dialects of Ancient Greek, it ...
) is defined as to satisfy :\eta^2 = \frac which can be written as :\eta^2 = \frac, \text^2 = \frac \text ^2 = \frac, i.e. the weighted variance of the category means divided by the variance of all samples. If the relationship between values of x and values of \overline_x is linear (which is certainly true when there are only two possibilities for ''x'') this will give the same result as the square of Pearson's
correlation coefficient A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two c ...
; otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.


Range

The correlation ratio \eta takes values between 0 and 1. The limit \eta=0 represents the special case of no dispersion among the means of the different categories, while \eta=1 refers to no dispersion within the respective categories. \eta is undefined when all data points of the complete population take the same value.


Example

Suppose there is a distribution of test scores in three topics (categories): *Algebra: 45, 70, 29, 15 and 21 (5 scores) *Geometry: 40, 20, 30 and 42 (4 scores) *Statistics: 65, 95, 80, 70, 85 and 73 (6 scores). Then the subject averages are 36, 33 and 78, with an overall average of 52. The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860. The overall sum of squares of the differences from the overall average is 9640. The difference of 6780 between these is also the weighted sum of the squares of the differences between the subject averages and the overall average: :5 (36-52)^2 + 4 (33-52)^2 +6 (78-52)^2 = 6780. This gives :\eta^2 = \frac=0.7033\ldots suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root gives :\eta = \sqrt=0.8386\ldots. For \eta = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78. The limit \eta = 0 refers to the case without dispersion among the categories contributing to the overall dispersion. The trivial requirement for this extreme is that all category means are the same.


Pearson vs. Fisher

The correlation ratio was introduced by
Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...
as part of
analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
.
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 â€“ 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
commented:
"As a descriptive statistic the utility of the correlation ratio is extremely limited. It will be noticed that the number of
degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...
in the numerator of \eta^2 depends on the number of the arrays"
to which
Egon Pearson Egon Sharpe Pearson (11 August 1895 â€“ 12 June 1980) was one of three children of Karl Pearson and Maria, née Sharpe, and, like his father, a British statistician. Career Pearson was educated at Winchester College and Trinity College ...
(Karl's son) responded by saying
"Again, a long-established method such as the use of the correlation ratio �45 The "Correlation Ratio" ηis passed over in a few words without adequate description, which is perhaps hardly fair to the student who is given no opportunity of judging its scope for himself."Pearson E.S. (1926) "Review of Statistical Methods for Research Workers (R. A. Fisher)", "Science Progress", 20, 733-734
(excerpt)
/ref>
{{inline, date=August 2011


References

Covariance and correlation Statistical ratios