In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a
linear function
In mathematics, the term linear function refers to two distinct but related notions:
* In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For di ...
of a set of other variables. It is the
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between the variable's values and the best predictions that can be computed
linearly from the predictive variables.
The coefficient of multiple correlation takes values between 0 and 1. Higher values indicate higher predictability of the
dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
from the
independent variables, with a value of 1 indicating that the predictions are exactly correct and a value of 0 indicating that no linear combination of the independent variables is a better predictor than is the fixed
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the dependent variable.
Multiple correlation coefficient
/ref>
The coefficient of multiple correlation is known as the square root of the coefficient of determination, but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure.
Definition
The coefficient of multiple correlation, denoted ''R'', is a scalar that is defined as the Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviatio ...
between the predicted and the actual values of the dependent variable in a linear regression model that includes an intercept.
Computation
The square of the coefficient of multiple correlation can be computed using the vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
of correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
s between the predictor variables (independent variables) and the target variable (dependent variable), and the correlation matrix
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
of correlations between predictor variables. It is given by
::
where is the transpose
In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...
of , and is the inverse of the matrix
::
If all the predictor variables are uncorrelated, the matrix is the identity matrix and simply equals , the sum of the squared correlations with the dependent variable. If the predictor variables are correlated among themselves, the inverse of the correlation matrix accounts for this.
The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the sum of squares of residuals—that is, the sum of the squares of the prediction errors—divided by the sum of squares of deviations of the values of the dependent variable from its expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
.
Properties
With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of on and will in general have a different than will a regression of on and . For example, suppose that in a particular sample the variable is uncorrelated
In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
with both and , while and are linearly related to each other. Then a regression of on and will yield an of zero, while a regression of on and will yield a strictly positive . This follows since the correlation of with its best predictor based on and is in all cases at least as large as the correlation of with its best predictor based on alone, and in this case with providing no explanatory power it will be exactly as large.
References
Further reading
* Allison, Paul D. (1998). ''Multiple Regression: A Primer''. London: Sage Publications.
* Cohen, Jacob, et al. (2002). ''Applied Multiple Regression: Correlation Analysis for the Behavioral Sciences''.
* Crown, William H. (1998). ''Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models''.
* Edwards, Allen Louis (1985). ''Multiple Regression and the Analysis of Variance and Covariance''.
* Keith, Timothy (2006). ''Multiple Regression and Beyond''. Boston: Pearson Education.
* Fred N. Kerlinger, Elazar J. Pedhazur (1973). ''Multiple Regression in Behavioral Research.'' New York: Holt Rinehart Winston.
* Stanton, Jeffrey M. (2001)
"Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors"
''Journal of Statistics Education'', 9 (3).
{{DEFAULTSORT:Multiple Correlation
Correlation indicators
Regression analysis