In statistics, Ward's method is a criterion applied in

hierarchical cluster analysis In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into tw ...

. Ward's minimum variance method is a special case of the

objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...

approach originally presented by Joe H. Ward, Jr. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be "any function that reflects the investigator's purpose." Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares, and this example is known as ''Ward's method'' or more precisely ''Ward's minimum variance method''. The nearest-neighbor chain algorithm can be used to find the same clustering defined by Ward's method, in time proportional to the size of the input

distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of orde ...

and space linear in the number of points being clustered.

The minimum variance criterion

Ward's minimum variance criterion minimizes the total within-cluster variance. To implement this method, at each step find the pair of clusters that leads to minimum increase in total within-cluster variance after merging. This increase is a weighted squared distance between cluster centers. At the initial step, all clusters are singletons (clusters containing a single point). To apply a recursive algorithm under this

, the initial distance between individual objects must be (proportional to) squared

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...

. The initial cluster distances in Ward's minimum variance method are therefore defined to be the squared Euclidean distance between points: :

d_=d(\, \) = .

Note: In software that implements Ward's method, it is important to check whether the function arguments should specify Euclidean distances or squared Euclidean distances.

Lance–Williams algorithms

Ward's minimum variance method can be defined and implemented recursively by a Lance–Williams algorithm. The Lance–Williams algorithms are an infinite family of agglomerative hierarchical clustering algorithms which are represented by a recursive formula for updating cluster distances at each step (each time a pair of clusters is merged). At each step, it is necessary to optimize the objective function (find the optimal pair of clusters to merge). The recursive formula simplifies finding the optimal pair. Suppose that clusters

C_i

and

C_j

were next to be merged. At this point all of the current pairwise cluster distances are known. The recursive formula gives the updated cluster distances following the pending merge of clusters

C_i

and

C_j

. Let *

d_

d_

, and

d_

be the pairwise distances between clusters

C_i

C_j

, and

C_k

, respectively, *

d_

be the distance between the new cluster

C_i \cup C_j

and

C_k

. An algorithm belongs to the Lance-Williams family if the updated cluster distance

d_

can be computed recursively by :

d_ = \alpha_i d_ + \alpha_j d_ + \beta d_ +  \gamma , d_ - d_, ,

where

\alpha_i, \alpha_j, \beta,

and

\gamma

are parameters, which may depend on cluster sizes, that together with the cluster distance function

d_

determine the clustering algorithm. Several standard clustering algorithms such as single linkage, complete linkage, and group average method have a recursive formula of the above type. A table of parameters for standard methods is given by several authors. Ward's minimum variance method can be implemented by the Lance–Williams formula. For disjoint clusters

C_i, C_j,

and

C_k

with sizes

n_i, n_j,

and

n_k

respectively: :

d(C_i \cup C_j, C_k) = 
 \frac\;d(C_i,C_k) +
 \frac\;d(C_j,C_k) -
 \frac\;d(C_i,C_j).

Hence Ward's method can be implemented as a Lance–Williams algorithm with :

\alpha_i = \frac, \qquad
 \alpha_j = \frac, \qquad
 \beta =\frac, \qquad
 \gamma = 0.

Variations

The popularity of the Ward's method has led to variations of it. For instance, Ward_p introduces the use of cluster specific feature weights, following the intuitive idea that features could have different degrees of relevance at different clusters.

The minimum variance criterion

Lance–Williams algorithms

Variations

References

Further reading