RV Coefficient
   HOME

TheInfoList



OR:

In statistics, the RV coefficient is a multivariate generalization of the ''squared''
Pearson correlation coefficient In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviatio ...
(because the RV coefficient takes values between 0 and 1). It measures the closeness of two set of points that may each be represented in a
matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...
. The major approaches within statistical multivariate data analysis can all be brought into a common framework in which the RV coefficient is maximised subject to relevant constraints. Specifically, these statistical methodologies include: :*
principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...
:*
canonical correlation analysis In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
:*multivariate regression :*
statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...
( linear discrimination). One application of the RV coefficient is in
functional neuroimaging Functional neuroimaging is the use of neuroimaging technology to measure an aspect of brain function, often with a view to understanding the relationship between activity in certain brain areas and specific mental functions. It is primarily used a ...
where it can measure the similarity between two subjects' series of brain scans or between different scans of a same subject.


Definitions

The definition of the RV-coefficient makes use of ideas concerning the definition of scalar-valued quantities which are called the "variance" and "covariance" of vector-valued
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...
. Note that standard usage is to have matrices for the variances and covariances of vector random variables. Given these innovative definitions, the RV-coefficient is then just the correlation coefficient defined in the usual way. Suppose that ''X'' and ''Y'' are matrices of centered random vectors (column vectors) with covariance matrix given by :\Sigma_=\operatorname( XY^\top) \,, then the scalar-valued covariance (denoted by COVV) is defined by :\operatorname(X,Y)= \operatorname(\Sigma_\Sigma_) \, . The scalar-valued variance is defined correspondingly: :\operatorname(X)= \operatorname(\Sigma_^2) \, . With these definitions, the variance and covariance have certain additive properties in relation to the formation of new vector quantities by extending an existing vector with the elements of another. Then the RV-coefficient is defined by :\mathrm(X,Y) = \frac \, .


Shortcoming of the coefficient and adjusted version

Even though the coefficient takes values between 0 and 1 by construction, it seldom attains values close to 1 as the denominator is often too large with respect to the maximal attainable value of the denominator. Given known diagonal blocks \Sigma_ and \Sigma_ of dimensions p\times p and q\times q respectively, assuming that p \le q without loss of generality, it has been proved that the maximal attainable numerator is \operatorname(\Lambda_X \Pi \Lambda_Y), where \Lambda_X (resp. \Lambda_Y ) denotes the diagonal matrix of the eigenvalues of \Sigma_ (resp. \Sigma_ ) sorted decreasingly from the upper leftmost corner to the lower rightmost corner and \Pi is the p \times q matrix (I_p \ 0_ ). In light of this, Mordant and Segers proposed an adjusted version of the RV coefficient in which the denominator is the maximal value attainable by the numerator. It reads :\bar(X,Y) = \frac = \frac. The impact of this adjustment is clearly visible in practice.


See also

*
Congruence coefficient In multivariate statistics, the congruence coefficient is an index of the similarity between factors that have been derived in a factor analysis. It was introduced in 1948 by Cyril Burt who referred to it as ''unadjusted correlation''. It is also ...
* Distance correlation


References

{{Reflist Covariance and correlation