Dice-Sørensen Coefficient

	Dice-Sørensen Coefficient The Dice-Sørensen coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Lee Raymond Dice and Thorvald Sørensen, who published in 1945 and 1948 respectively. Name The index is known by several other names, especially Sørensen–Dice index, Sørensen index and Dice's coefficient. Other variations include the "similarity coefficient" or "index", such as Dice similarity coefficient (DSC). Common alternate spellings for Sørensen are ''Sorenson'', ''Soerenson'' and ''Sörenson'', and all three can also be seen with the ''–sen'' ending (the Danish letter ø is phonetically equivalent to the German/Swedish ö, which can be written as oe in ASCII). Other names include: * F1 score * Czekanowski's binary (non-quantitative) index * Measure of genetic similarity * Zijdenbos similarity index, referring to a 1994 paper of Zijdenbos et al. Formula Sørensen's original formula was intended to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sample (statistics) In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a population (statistics), statistical population to estimate characteristics of the whole population. The subset is meant to reflect the whole population, and statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population (in many cases, collecting the whole population is impossible, like getting sizes of all stars in the universe), and thus, it can provide insights in cases where it is infeasible to measure an entire population. Each observation measures one or more properties (such as weight, location, colour or mass) of independent objects or individuals. In survey sampling, weights can be applied to the data to adjust for the sample design, particularly in stratified samplin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Euclidean Distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is occasionally called the Pythagorean distance. These names come from the ancient Greek mathematicians Euclid and Pythagoras. In the Greek deductive geometry exemplified by Euclid's ''Elements'', distances were not represented as numbers but line segments of the same length, which were considered "equal". The notion of distance is inherent in the compass tool used to draw a circle, whose points all have the same distance from a common center point. The connection from the Pythagorean theorem to distance calculation was not made until the 18th century. The distance between two objects that are not points is usually defined to be the smallest distance among pairs of points from the two objects. Formulas are known for computing distances b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Tversky Index The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index. For sets ''X'' and ''Y'' the Tversky index is a number between 0 and 1 given by S(X, Y) = \frac Here, X \setminus Y denotes the relative complement In set theory, the complement of a set , often denoted by A^c (or ), is the set of elements not in . When all elements in the universe, i.e. all elements under consideration, are considered to be members of a given set , the absolute complement ... of Y in X. Further, \alpha, \beta \ge 0 are parameters of the Tversky index. Setting \alpha = \beta = 1 produces the Jaccard index; setting \alpha = \beta = 0.5 produces the Sørensen–Dice coefficient. If we consider ''X'' to be the prototype and ''Y'' to be the variant, then \alpha corresponds to the weight of the prototype and \beta correspo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Overlap Coefficient The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two finite sets. It is related to the Jaccard index and is defined as the size of the intersection divided by the size of the smaller of two sets: :\operatorname(A,B) = \frac Note that 0 \leq \operatorname(A,B) \leq 1. If set ''A'' is a subset In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ... of ''B'' or the converse, then the overlap coefficient is equal to 1. See Also * Jaccard index Notes References Information retrieval techniques Information retrieval evaluation Measure theory Similarity measures {{metric-geometry-stub ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Morisita's Overlap Index Morisita's overlap index, named after Masaaki Morisita, is a statistical measure of dispersion of individuals in a population. It is used to compare overlap among samples (Morisita 1959). This formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas). Formula: : C_D= \frac : ''x''''i'' is the number of times species ''i'' is represented in the total ''X'' from one sample. : ''y''''i'' is the number of times species ''i'' is represented in the total ''Y'' from another sample. : ''D''''x'' and ''D''''y'' are the Simpson's index values for the ''x'' and ''y'' samples respectively. : ''S'' is the number of unique species ''C''''D'' = 0 if the two samples do not overlap in terms of species, and ''C''''D'' = 1 if the species occur in the same proportions in both samples. Horn's modification of the index is (Horn 1966): :C_H= \frac \,. Note, not to be confused with M ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mantel Test The Mantel test, named after Nathan Mantel, is a statistical test of the correlation between two matrices. The matrices must be of the same dimension; in most applications, they are matrices of interrelations between the same vectors of objects. The test was first published by Nathan Mantel, a biostatistician at the National Institutes of Health, in 1967. Accounts of it can be found in advanced statistics books (e.g., Sokal & Rohlf 1995). Usage The test is commonly used in ecology, where the data are usually estimates of the "distance" between objects such as species of organisms. For example, one matrix might contain estimates of the genetic distances (i.e., the amount of difference between two different genomes) between all possible pairs of species in the study, obtained by the methods of molecular systematics; while the other might contain estimates of the geographical distance between the ranges of each species to every other species. In this case, the hypothesis being test ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hamming Distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to change one string into the other, or equivalently, the minimum number of ''errors'' that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming. A major application is in coding theory, more specifically to block codes, in which the equal-length strings are Vector space, vectors over a finite field. Definition The Hamming distance between two equal-length strings of symbols is the number of positions at which the corresponding symbols are different. Examples The symbols may be letters, bits, or decimal digits, am ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are '' linearly'' related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the demand curve. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However, in g ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hellinger Distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance. Definition Measure theory To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures on a measure space \mathcal that are absolutely continuous with respect to an auxiliary measure \lambda. Such a measure always exists, e.g \lambda = (P + Q). The square of the Hellinger distance between P and Q is defined as the quantity :H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx). Here, P(dx) = p(x)\lambda(dx) and Q(dx) = q(x) \lambda(dx), i.e. p and q are the Radon–Nikodym derivatives of ''P'' and ''Q'' respect ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hugo Steinhaus Hugo Dyonizy Steinhaus ( , ; 14 January 1887 – 25 February 1972) was a Polish mathematician and educator. Steinhaus obtained his PhD under David Hilbert at Göttingen University in 1911 and later became a professor at the Jan Kazimierz University in Lwów (now Lviv, Ukraine), where he helped establish what later became known as the Lwów School of Mathematics. He is credited with "discovering" mathematician Stefan Banach, with whom he gave a notable contribution to functional analysis through the Banach–Steinhaus theorem. After World War II Steinhaus played an important part in the establishment of the mathematics department at Wrocław University and in the revival of Polish mathematics from the destruction of the war. Author of around 170 scientific articles and books, Steinhaus has left his legacy and contribution in many branches of mathematics, such as functional analysis, geometry, mathematical logic, and trigonometry. Notably he is regarded as one of the early founde ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bray–Curtis Dissimilarity In ecology and biology, the Bray–Curtis dissimilarity is a statistic used to quantify the dissimilarity in species composition between two different sites, based on counts at each site. It is named after J. Roger Bray and John T. Curtis who first presented it in a paper in 1957. The Bray-Curtis dissimilarity BC_ between two sites j and k is : BC_ = 1 - \frac = 1 - \frac where N_ is the number of specimens of species i at site j, N_ is the number of specimens of species i at site k, and p the total number of species in the samples. In the alternative shorthand notation C_ is the sum of the lesser counts of each species. S_j and S_k are the total number of specimens counted at both sites. The index can be simplified to 1-2C/2 = 1-C when the abundances at each site are expressed as proportions, though the two forms of the equation only produce matching results when the total number of specimens counted at both sites are the same. Further treatment can be found in Legen ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]