Rank Product
   HOME

TheInfoList



OR:

The rank product is a biologically motivated rank test for the detection of differentially expressed genes in replicated
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
experiments. It is a simple
non-parametric Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric sta ...
statistical method based on ranks of fold changes. In addition to its use in expression profiling, it can be used to combine ranked lists in various application domains, including
proteomics Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
,
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
, statistical
meta-analysis Meta-analysis is a method of synthesis of quantitative data from multiple independent studies addressing a common research question. An important part of this method involves computing a combined effect size across all of the studies. As such, th ...
, and general
feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...
.


Calculation of the rank product

Given ''n'' genes and ''k'' replicates, let r_ be the rank of gene ''g'' in the ''i''-th replicate. Compute the rank product via the
geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
: : RP(g)=(\Pi_^kr_)^


Determination of significance levels

Simple permutation-based estimation is used to determine how likely a given RP value or better is observed in a random experiment. # generate ''p''
permutation In mathematics, a permutation of a set can mean one of two different things: * an arrangement of its members in a sequence or linear order, or * the act or process of changing the linear order of an ordered set. An example of the first mean ...
s of ''k'' rank lists of length ''n''. # calculate the rank products of the ''n'' genes in the ''p'' permutations. # count how many times the rank products of the genes in the permutations are smaller or equal to the observed rank product. Set ''c'' to this value. # calculate the average expected value for the rank product by: \mathrm_(g)=c/p. # calculate the percentage of false positives as : \mathrm(g)=\mathrm_(g)/\mathrm(g) where \mathrm(g) is the rank of gene ''g'' in a list of all ''n'' genes sorted by increasing \mathrm.


Exact probability distribution and accurate approximation

Permutation re-sampling requires a computationally demanding number of permutations to get reliable estimates of the ''p''-values for the most differentially expressed genes, if ''n'' is large. Eisinga, Breitling and Heskes (2013) provide the exact probability mass distribution of the rank product statistic. Calculation of the exact ''p''-values offers a substantial improvement over permutation approximation, most significantly for that part of the distribution rank product analysis is most interested in, i.e., the thin right tail. However, exact statistical significance of large rank products may take unacceptable long amounts of time to compute. Heskes, Eisinga and Breitling (2014) provide a method to determine accurate approximate ''p''-values of the rank product statistic in a computationally fast manner.


See also

*
Ranking A ranking is a relationship between a set of items, often recorded in a list, such that, for any two items, the first is either "ranked higher than", "ranked lower than", or "ranked equal to" the second. In mathematics, this is known as a weak ...
*
Schulze method Articles with example pseudocode Debian Electoral systems Monotonic Condorcet methods Single-winner electoral systems The Schulze method (), also known as the beatpath method, is a single winner ranked-choice voting rule developed by Markus ...
*
Comparison of electoral systems This article discusses the methods and results of comparing different electoral system, electoral systems. There are two broad methods to compare voting systems: # Metrics of voter satisfaction, either through simulation or survey. # #Logical crit ...
*
Arrow's impossibility theorem Arrow's impossibility theorem is a key result in social choice theory showing that no ranked-choice procedure for group decision-making can satisfy the requirements of rational choice. Specifically, Arrow showed no such rule can satisfy the ind ...


References

* Breitling, R., Armengaud, P., Amtmann, A., and Herzyk, P. (2004) Rank Products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments, FEBS Letters, 573:83–-92 * * {{cite journal , last1 = Heskes , first1 = T. , last2 = Eisinga , first2 = R. , last3 = Breitling , first3 = R. , year = 2014 , title = A fast algorithm for determining bounds and accurate approximate ''p''-values of the rank product statistic for replicate experiments , journal = BMC Bioinformatics , volume = 15 , issue = 1, page = 367 , doi=10.1186/preaccept-1857144210135244, pmid = 25413493 , pmc = 4245829 , doi-access = free Gene expression Nonparametric statistics Microarrays