UniFrac
   HOME

TheInfoList



OR:

UniFrac, a shortened version of unique fraction metric, is a
distance metric In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are a general setting for ...
used for comparing
biological communities A biocenosis (UK English, biocoenosis, also biocenose, biocoenose, biotic community, biological community, ecological community, life assemblage), coined by Karl Möbius in 1877, describes the interacting organisms living together in a habita ...
. It differs from dissimilarity measures such as Bray-Curtis dissimilarity in that it incorporates information on the relative relatedness of community members by incorporating phylogenetic distances between observed organisms in the computation. Both weighted (quantitative) and unweighted (qualitative) variants of UniFrac are widely used in
microbial ecology Microbial ecology (or environmental microbiology) is a discipline where the interaction of Microorganism, microorganisms and their environment are studied. Microorganisms are known to have important and harmful ecological relationships within t ...
, where the former accounts for abundance of observed organisms, while the latter only considers their presence or absence. The method was devised by Catherine Lozupone, when she was working with Rob Knight of the
University of Colorado at Boulder The University of Colorado Boulder (CU Boulder, CU, or Colorado) is a Public university, public research university in Boulder, Colorado, United States. Founded in 1876, five months before Colorado became a Federated state, state, it is the fla ...
in 2005.


Research methods

The distance is calculated between pairs of samples (each sample represents an organismal community). All taxa found in one or both samples are placed on a
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
. A branch leading to taxa from both samples is marked as "shared" and branches leading to taxa which appears only in one sample are marked as "unshared". The distance between the two samples is then calculated as: :::\left ( \frac \right ) fraction~of~total~unshared~branch~lengths This definition satisfies the requirements of a
distance metric In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are a general setting for ...
, being non-negative, zero only when entities are identical, transitive, and conforming to the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...
. If there are several different samples, a distance matrix can be created by making a tree for each pair of samples and calculating their UniFrac measure. Subsequently, standard multivariate statistical methods such as
data clustering Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some specific sense defined by the analyst) to each o ...
and principal co-ordinates analysis can be used. One can determine the statistical significance of the UniFrac distance between two samples using
Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be det ...
s. By randomizing the sample classification of each taxon on the tree (leaving the branch structure unchanged) and creating a distribution of UniFrac distance values, one can obtain a distribution of UniFrac values. From this, a p-value can be given to the actual distance between the samples. Additionally, there is a weighted version of the UniFrac metric which accounts for the relative abundance of each of the taxa within the communities. This is commonly used in
metagenomic Metagenomics is the study of all genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the microbial co ...
studies, where the number of metagenomic reads can be in the tens of thousands, and it is appropriate to 'bin' these reads into
operational taxonomic unit An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, wh ...
s, or OTUs, which can then be dealt with as taxa within the UniFrac framework. In 2012, a generalized UniFrac version, which unifies the weighted and unweighted UniFrac distance in a single framework, was proposed. The authors argued that the weighted and unweighted UniFrac distances place too much emphasis on either abundant lineages or rare lineages, respectively, leading to “loss of power when the important composition change occurs in moderately abundant lineages”. The generalized UniFrac distance aims to address this limitation by down-weighting the emphasis on abundant or rare lineages.


References


External links


UniFrac online

Knight Lab website

Description of UniFrac, with worked examples
{{DEFAULTSORT:Unifrac Metagenomics software Environmental microbiology Genomics Bioinformatics