The Mantel test, named after
Nathan Mantel
Nathan Mantel (February 16, 1919 – May 25, 2002) was an American biostatistician best known for his work with William Haenszel, which led to the Mantel–Haenszel test and its associated estimate, the Mantel–Haenszel odds ratio. The Mantel– ...
, is a
statistical
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
test of the
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
between two
matrices
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** ''The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
. The matrices must be of the same dimension; in most applications, they are matrices of interrelations between the same
vector
Vector most often refers to:
*Euclidean vector, a quantity with a magnitude and a direction
*Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematic ...
s of objects. The test was first published by
Nathan Mantel
Nathan Mantel (February 16, 1919 – May 25, 2002) was an American biostatistician best known for his work with William Haenszel, which led to the Mantel–Haenszel test and its associated estimate, the Mantel–Haenszel odds ratio. The Mantel– ...
, a biostatistician at the
National Institutes of Health
The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government
The federal government of the United States (U.S. federal government or U ...
, in 1967.
Accounts of it can be found in advanced statistics books (e.g., Sokal & Rohlf 1995
).
Usage
The test is commonly used in
ecology
Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overl ...
, where the data are usually estimates of the "distance" between objects such as
species
In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...
of organisms. For example, one matrix might contain estimates of the
genetic
Genetic may refer to:
*Genetics, in biology, the science of genes, heredity, and the variation of organisms
**Genetic, used as an adjective, refers to genes
***Genetic disorder, any disorder caused by a genetic mutation, whether inherited or de nov ...
distances (i.e., the amount of difference between two different genomes) between all possible pairs of species in the study, obtained by the methods of
molecular systematics
Molecular phylogenetics () is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to ...
; while the other might contain estimates of the geographical distance between the ranges of each species to every other species. In this case, the hypothesis being tested is whether the variation in genetics for these organisms is correlated to the variation in geographical distance.
Method
If there are ''n'' objects, and the matrix is
symmetrical
Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
(so the distance from object ''a'' to object ''b'' is the same as the distance from ''b'' to ''a'') such a matrix contains
:
distances. Because distances are not independent of each other – since changing the "position" of one object would change
of these distances (the distance from that object to each of the others) – we can not assess the relationship between the two matrices by simply evaluating the
correlation coefficient
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two componen ...
between the two sets of distances and testing its
statistical significance
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
. The Mantel test deals with this problem.
The procedure adopted is a kind of randomization or
permutation test
A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction.
A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...
. The correlation between the two sets of
distances is calculated, and this is both the measure of correlation reported and the
test statistic
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifie ...
on which the test is based. In principle, any correlation coefficient could be used, but normally the
Pearson product-moment correlation coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...
is used.
In contrast to the ordinary use of the correlation coefficient, to assess significance of any apparent departure from a zero correlation, the rows and columns of one of the matrices are subjected to
random permutation
A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and sim ...
s many times, with the correlation being recalculated after each permutation. The significance of the observed correlation is the proportion of such permutations that lead to a higher correlation coefficient.
The reasoning is that if the
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of there being no relation between the two matrices is true, then permuting the rows and columns of the matrix should be equally likely to produce a larger or a smaller coefficient. In addition to overcoming the problems arising from the statistical dependence of elements within each of the two matrices, use of the permutation test means that no reliance is being placed on assumptions about the statistical distributions of elements in the matrices.
Many
statistical packages
Statistical software are specialized computer programs for analysis in statistics and econometrics.
Open-source
* ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
* ADMB – a softw ...
include routines for carrying out the Mantel test.
Criticism
The various papers introducing the Mantel test (and its extension, the partial Mantel test) lack a clear statistical framework specifying fully the null and alternative hypotheses. This may convey the wrong idea that these tests are universal. For example, the Mantel and partial Mantel tests can be flawed in the presence of spatial auto-correlation and return erroneously low p-values.
See, e.g., Guillot and Rousset (2013).
See also
*
Non-parametric statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
*
Sørensen–Dice coefficient The Sørensen–Dice coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Thorvald Sørensen and Lee Raymond Dice, who published in 1948 and 1945 resp ...
References
{{Reflist
External links
The Mantel test in ecology
Statistical tests
Nonparametric statistics
Permutations