In statistics, the RV coefficient
is a
multivariate generalization of the ''squared''
Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...
(because the RV coefficient takes values between 0 and 1).
It measures the closeness of two set of points that may each be represented in a
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
.
The major approaches within
statistical multivariate data analysis can all be brought into a common framework in which the RV coefficient is maximised subject to relevant constraints. Specifically, these statistical methodologies include:
:*
principal component analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
:*
canonical correlation analysis
:*multivariate
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
:*
statistical classification (
linear discrimination).
One application of the RV coefficient is in
functional neuroimaging
Functional neuroimaging is the use of neuroimaging technology to measure an aspect of brain function, often with a view to understanding the relationship between activity in certain brain areas and specific mental functions. It is primarily used a ...
where it can measure
the similarity between two subjects' series of brain scans
or between different scans of a same subject.
Definitions
The definition of the RV-coefficient makes use of ideas
[
]
concerning the definition of scalar-valued quantities which are called the "variance" and "covariance" of vector-valued
random variables. Note that standard usage is to have matrices for the variances and covariances of vector random variables.
Given these innovative definitions, the RV-coefficient is then just the correlation coefficient defined in the usual way.
Suppose that ''X'' and ''Y'' are matrices of centered random vectors (column vectors) with covariance matrix given by
:
then the scalar-valued covariance (denoted by COVV) is defined by
:
The scalar-valued variance is defined correspondingly:
:
With these definitions, the variance and covariance have certain additive properties in relation to the formation of new vector quantities by extending an existing vector with the elements of another.
Then the RV-coefficient is defined by
:
Shortcoming of the coefficient and adjusted version
Even though the coefficient takes values between 0 and 1 by construction, it seldom attains values close to 1 as the denominator is often too large with respect to the maximal attainable value of the denominator.
[
]
Given known diagonal blocks
and
of dimensions
and
respectively, assuming that
without loss of generality, it has been proved
[
] that the maximal attainable numerator is
where
(resp.
) denotes the diagonal matrix of the eigenvalues of
(resp.
) sorted decreasingly from the upper leftmost corner to the lower rightmost corner and
is the
matrix
.
In light of this, Mordant and Segers
[ proposed an adjusted version of the RV coefficient in which the denominator is the maximal value attainable by the numerator. It reads
:
The impact of this adjustment is clearly visible in practice.]
See also
* Congruence coefficient
* Distance correlation
References
{{Reflist
Covariance and correlation