Recursive partitioning is a

statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

method for

multivariable analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., ''multivariate random variables''. Multivariate statistics concerns understanding the differe ...

. Recursive partitioning creates a

decision tree A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...

that strives to correctly classify members of the population by splitting it into sub-populations based on several dichotomous

independent variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

s. The process is termed

recursive Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...

because each sub-population may in turn be split an indefinite number of times until the splitting process terminates after a particular stopping criterion is reached. Recursive partitioning methods have been developed since the 1980s. Well known methods of recursive partitioning include Ross Quinlan's

ID3 algorithm In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross QuinlanQuinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106 used to generate a decision tree from a dataset. ID3 is th ...

and its successors,

C4.5 C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often refer ...

and C5.0 and Classification and Regression Trees (CART).

Ensemble learning In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statist ...

methods such as

Random Forest Random forests or random decision forests is an ensemble learning method for statistical classification, classification, regression analysis, regression and other tasks that works by creating a multitude of decision tree learning, decision trees ...

s help to overcome a common criticism of these methods – their vulnerability to

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

of the data – by employing different algorithms and combining their output in some way. This article focuses on recursive partitioning for medical

diagnostic Diagnosis (: diagnoses) is the identification of the nature and cause of a certain phenomenon. Diagnosis is used in a lot of different academic discipline, disciplines, with variations in the use of logic, analytics, and experience, to determine " ...

tests, but the technique has far wider applications. See

. As compared to regression analysis, which creates a formula that health care providers can use to calculate the probability that a patient has a disease, recursive partition creates a rule such as 'If a patient has finding x, y, or z they probably have disease q'. A variation is 'Cox linear recursive partitioning'.

Advantages and disadvantages

Compared to other multivariable methods, recursive partitioning has advantages and disadvantages. * Advantages are: ** Generates clinically more intuitive models that do not require the user to perform calculations. ** Allows varying prioritizing of misclassifications in order to create a decision rule that has more sensitivity or specificity. ** May be more accurate. * Disadvantages are: ** Does not work well for continuous variables ** May overfit data.

Examples

Examples are available of using recursive partitioning in research of diagnostic tests. Goldman used recursive partitioning to prioritize sensitivity in the diagnosis of

myocardial infarction A myocardial infarction (MI), commonly known as a heart attack, occurs when Ischemia, blood flow decreases or stops in one of the coronary arteries of the heart, causing infarction (tissue death) to the heart muscle. The most common symptom ...

among patients with chest pain in the emergency room.

References

{{reflist Statistical classification Biostatistics

Advantages and disadvantages

Examples

See also

References