An alternating decision tree (ADTree) is a
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
method for classification. It generalizes
decision trees
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains cond ...
and has connections to
boosting.
An ADTree consists of an alternation of decision nodes, which specify a predicate condition, and prediction nodes, which contain a single number. An instance is classified by an ADTree by following all paths for which all decision nodes are true, and summing any prediction nodes that are traversed.
History
ADTrees were introduced by
Yoav Freund and Llew Mason.
[
] However, the algorithm as presented had several typographical errors. Clarifications and optimizations were later presented by Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby.
Implementations are available in
Weka
The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. It is the only extant member of the genus '' Gallirallus''. Four subspecies are recog ...
and JBoost.
Motivation
Original
boosting algorithms typically used either
decision stump
A decision stump is a machine learning model consisting of a one-level decision tree. That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a predi ...
s
or decision trees as weak hypotheses. As an example, boosting
decision stump
A decision stump is a machine learning model consisting of a one-level decision tree. That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a predi ...
s creates
a set of
weighted decision stumps (where
is the number of boosting iterations), which then vote on the final classification according to their weights. Individual decision stumps are weighted according to their ability to classify the data.
Boosting a simple learner results in an unstructured set of
hypotheses, making it difficult to infer
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
s between attributes. Alternating decision trees introduce structure to the set of hypotheses by requiring that they build off a hypothesis that was produced in an earlier iteration. The resulting set of hypotheses can be visualized in a tree based on the relationship between a hypothesis and its "parent."
Another important feature of boosted algorithms is that the data is given a different
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
at each iteration. Instances that are misclassified are given a larger weight while accurately classified instances are given reduced weight.
Alternating decision tree structure
An alternating decision tree consists of decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes contain a single number. ADTrees always have prediction nodes as both root and leaves. An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed. This is different from binary classification trees such as CART (
Classification and regression tree
Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of ob ...
) or
C4.5
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referr ...
in which an instance follows only one path through the tree.
Example
The following tree was constructed using JBoost on the spambase dataset (available from the UCI Machine Learning Repository).
In this example, spam is coded as and regular email is coded as .
The following table contains part of the information for a single instance.
The instance is scored by summing all of the prediction nodes through which it passes. In the case of the instance above, the score is
calculated as
The final score of is positive, so the instance is classified as spam. The magnitude of the value is a measure of confidence in the prediction. The original authors list three potential levels of interpretation for the set of attributes identified by an ADTree:
* Individual nodes can be evaluated for their own predictive ability.
* Sets of nodes on the same path may be interpreted as having a joint effect
* The tree can be interpreted as a whole.
Care must be taken when interpreting individual nodes as the scores reflect a re weighting of the data in each iteration.
Description of the algorithm
The inputs to the alternating decision tree algorithm are:
* A set of inputs
where
is a vector of attributes and
is either -1 or 1. Inputs are also called instances.
* A set of weights
corresponding to each instance.
The fundamental element of the ADTree algorithm is the rule. A single
rule consists of a precondition, a condition, and two scores. A
condition is a predicate of the form "attribute
value."
A precondition is simply a logical conjunction
In logic, mathematics and linguistics, And (\wedge) is the truth-functional operator of logical conjunction; the ''and'' of a set of operands is true if and only if ''all'' of its operands are true. The logical connective that represents thi ...
of conditions.
Evaluation of a rule involves a pair of nested if statements:
1 if (precondition)
2 if (condition)
3 return score_one
4 else
5 return score_two
6 end if
7 else
8 return 0
9 end if
Several auxiliary functions are also required by the algorithm:
* returns the sum of the weights of all positively labeled examples that satisfy predicate
* returns the sum of the weights of all negatively labeled examples that satisfy predicate
* returns the sum of the weights of all examples that satisfy predicate
The algorithm is as follows:
1 function ad_tree
2 input Set of training instances
3
4 for all
5
6 a rule with scores and , precondition "true" and condition "true."
7
8
9
10
11
12
13
14 new rule with precondition , condition , and weights and
15
16 end for
17 return set of
The set grows by two preconditions in each iteration, and it is possible to derive the tree structure of a set of rules by making note of the precondition that is used in each successive rule.
Empirical results
Figure 6 in the original paper demonstrates that ADTrees are typically as robust as boosted decision trees and boosted decision stump
A decision stump is a machine learning model consisting of a one-level decision tree. That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a predi ...
s. Typically, equivalent accuracy can be achieved with a much simpler tree structure than recursive partitioning algorithms.
References
External links
An introduction to Boosting and ADTrees
(Has many graphical examples of alternating decision trees in practice).
JBoost
software implementing ADTrees.
{{DEFAULTSORT:Alternating Decision Tree
Decision trees
Classification algorithms