Classifier chains is a

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

method for problem transformation in

multi-label classification In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of mult ...

. It combines the computational efficiency of the binary relevance method while still being able to take the label dependencies into account for

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

Problem transformation

Several problem transformation methods exist. One of them is the Binary Relevance method (BR). Given a set of labels

\mathit\,

and a data set with instances of the form

\mathit\,

where

\mathit\,

is a

feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...

and

Y \subseteq L

is a set of labels assigned to the instance. BR transforms the data set into

\left \vert L \right \vert

data sets and learns

\left \vert L \right \vert

binary classifiers

H: X \rightarrow \

for each label

l \in L

. During this process the information about dependencies between labels is not preserved. This can lead to a situation where a set of labels is assigned to an instance although these labels never co-occur together in the data set. Thus, information about label co-occurrence can help to assign correct label combinations. Loss of this information can in some cases lead to a decrease in classification performance. Another approach, which takes into account label correlations, is the Label Powerset method (LP). Each combination of labels in a data set is considered to be a single label. After transformation a single-label classifier

H: X \rightarrow \mathcal(L)

is trained where

\mathcal(L)

is the power set of all labels in

\mathit

. The main drawback of this approach is that the number of label combinations grows exponentially with the number of labels. For example, a multi-label data set with 10 labels can have up to

2^ = 1024

label combinations. This increases the run-time of classification. The Classifier Chains method is based on the BR method and it is efficient even on a big number of labels. Furthermore, it considers dependencies between labels.

Method description

For a given set of labels

\mathit\,

the Classifier Chain model (CC) learns

\left \vert L \right \vert

classifiers as in the Binary Relevance method. All classifiers are linked in a chain through feature space. Given a data set where the

i

-th instance has the form

\mathit\,

where

\mathit\,

is a subset of labels,

\mathit\,

is a set of features. The data set is transformed in

\left \vert L \right \vert

data sets where instances of the

j

-th data set has the form

((x_, l_, ..., l_), l_), l_ \in \

. If the

j

-th label was assigned to the instance then

\mathit\,

1

, otherwise it is

0

. Thus, classifiers build a chain where each of them learns binary classification of a single label. The features given to each classifier are extended with binary values that indicate which of previous labels were assigned to the instance. By classifying new instances the labels are again predicted by building a chain of classifiers. The classification begins with the first classifier

\mathit\,

and proceeds to the last one

\mathit\,

by passing label information between classifiers through the feature space. Hence, the inter-label dependency is preserved. However, the result can vary for different order of chains. For example, if a label often co-occur with some other label, then only instances of the label which comes later in the chain will have information about the other one in its feature vector. In order to solve this problem and increase accuracy it is possible to use

ensemble Ensemble may refer to: Art * Architectural ensemble * ''Ensemble'' (album), Kendji Girac 2015 album * Ensemble (band), a project of Olivier Alary * Ensemble cast (drama, comedy) * Ensemble (musical theatre), also known as the chorus * ''En ...

of classifiers. In Ensemble of Classifier Chains (ECC) several CC classifiers can be trained with random order of chains (i.e. random order of labels) on a random subset of data set. Labels of a new instance are predicted by each classifier separately. After that, the total number of predictions or "votes" is counted for each label. The label is accepted if it was predicted by a percentage of classifiers that is bigger than some threshold value.

Adaptations

There is also regressor chains, which themselves can resemble

vector autoregression Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregres ...

models if the order of the chain makes sure temporal order is respected.

References

{{reflist

External links

Better Classifier Chains for Multi-label Classification
Presentation on Classifier Chains by Jesse Read and Fernando Pérez Cruz Classification algorithms