Hoshen–Kopelman Algorithm
   HOME

TheInfoList



OR:

The Hoshen–Kopelman algorithm is a simple and efficient
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
for labeling
clusters may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Cluster II (spacecraft), a European Space Agency mission to study the magnetosphere * Asteroid cluster, a small ...
on a grid, where the grid is a regular
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known union-finding algorithm. The algorithm was originally described by Joseph Hoshen and
Raoul Kopelman Raoul Kopelman (October 21, 1933 – July 20, 2023) was a scientist, inventor, and the Richard Smalley Distinguished University Professor of Chemistry, Physics, Applied Physics, Biophysics, Biomedical Engineering and Chemical Biology at the Unive ...
in their 1976 paper "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm".


Percolation theory

Percolation theory In statistical physics and mathematics, percolation theory describes the behavior of a network when nodes or links are added. This is a geometric type of phase transition, since at a critical fraction of addition the network of small, disconnected ...
is the study of the behavior and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
of
clusters may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Cluster II (spacecraft), a European Space Agency mission to study the magnetosphere * Asteroid cluster, a small ...
on lattices. Suppose we have a large square lattice where each cell can be occupied with the
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
p and can be empty with the probability 1 – ''p''. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider the
4-connected neighborhood In image processing, pixel connectivity is the way in which pixels in 2-dimensional (or hypervoxels in n-dimensional) images relate to their neighbors. Formulation In order to specify a set of connectivities, the dimension and the width of t ...
that is top, bottom, left and right. Each occupied cell is independent of the status of its neighborhood. The number of clusters, the size of each cluster and their distribution are important topics in percolation theory.


Hoshen–Kopelman algorithm for cluster finding

In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called a
raster scan A raster scan, or raster scanning, is the rectangular pattern of image capture and reconstruction in television. By analogy, the term is used for raster graphics, the pattern of image storage and transmission used in most computer bitmap image s ...
. The algorithm begins with scanning the grid cell by cell and checking whether the cell is occupied or not. If the cell is occupied, then it must be labeled with a cluster label. This cluster label is assigned based on the neighbors of that cell. (For this we are going to use Union-Find Algorithm which is explained in the next section.) If the cell doesn’t have any occupied neighbors, then a new label is assigned to the cell.


Union-find algorithm

This algorithm is used to represent disjoint sets. Calling the function union(x,y) places items x and y into the same set. A second function find(x) returns a representative member of the set to which x belongs. The representative member of the set containing x is the label we will apply to the cluster to which x belongs. A key to the efficiency of the Union-Find Algorithm is that the find operation improves the underlying forest data structure that represents the sets, making future find queries more efficient.


Pseudocode

During the
raster scan A raster scan, or raster scanning, is the rectangular pattern of image capture and reconstruction in television. By analogy, the term is used for raster graphics, the pattern of image storage and transmission used in most computer bitmap image s ...
of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the union operation is performed, to specify that these neighboring cells are in fact members of the same set. Then thefind operation is performed to find a representative member of that set with which the current cell will be labeled. On the other hand, if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way. Following
pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...
is referred fro
Tobin Fricke's
implementation of the same algorithm. On completion, the cluster labels may be found in labels. Not shown is the second raster scan of the grid needed to produce the example output. In that scan, the value at label ,y/code> is replaced by find(label ,y. Raster Scan and Labeling on the Grid largest_label = 0; label = zeros _columns, n_rows labels = :n_columns*n_rows/* Array containing integers from 0 to the size of the image. */ for x in 0 to n_columns Union void union(int x, int y) Find int find(int x)


Example

Consider the following example. The dark cells in the grid in Figure (c) represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in Figure (d) with all the clusters labeled. The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array. * Starting from cell grid 0] i.e. the first cell. The cell is occupied and it doesn't have cells to the left or above so we will label this cell with a new label say 1. * grid 1] and grid 2] both are unoccupied so they are not labeled. * grid 3] is occupied so check cell to the left which is unoccupied so we increment the current label value and assign the label to the cell as 2. * grid 4], grid 5] and grid 0] are unoccupied so they are not labeled. * grid 1] is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label 3. * grid 2] is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of a cell on the left to this cell 3. * grid 3] is occupied so check cell to the left and above, both the cells are occupied, so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. 2. (Merging using union algorithm will label all the cells with label 3 to 2) * grid 4] is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of a cell on the left to this cell 2. * grid 5] , grid 0] and grid 1] are unoccupied so they are not labeled. * grid 2] is occupied so check cell to the left and above, only cell above is occupied so assign the label of the cell above to this cell 2. * grid 3] , grid 4] and grid 5] are unoccupied so they are not labeled. * grid 0] is occupied so check cell above which is unoccupied so we increment the current label value and assign the label to the cell as 4. * grid 1] is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of the cell on the left to this cell 4. * grid 2] is unoccupied so it is not labeled. * grid 3] is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label 5. * grid 4] is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of the cell on the left to this cell 5. * grid 5] , grid 0] and grid 1] are unoccupied so they are not labeled. * grid 2] is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label 6. * grid 3] is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. 5. (Merging using union algorithm will label all the cells with label 6 to 5). * grid 4] is unoccupied so it is not labeled. * grid 5] is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label 7. * grid 0] , grid 1] , grid 2] and grid 3] are unoccupied so they are not labeled. * grid 4] is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label 8. * grid 5] is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. 7. (Merging using union algorithm will label all the cells with label 8 to 7).


Applications

* Determination of Nodal Domain Area and Nodal Line Lengths * Connectivity (graph theory), Nodal Connectivity Information * Modeling of
electrical conduction Electrical resistivity (also called volume resistivity or specific electrical resistance) is a fundamental specific property of a material that measures its electrical resistance or how strongly it resists electric current. A low resistivity i ...


See also

* K-means clustering algorithm * Fuzzy clustering algorithm * Gaussian (
Expectation Maximization Expectation, or expectations, as well as expectancy or expectancies, may refer to: Science * Expectancy effect, including observer-expectancy effects and subject-expectancy effects such as the placebo effect * Expectancy theory of motivation * ...
) clustering algorithm * Clustering Methods * C-means Clustering Algorithm *
Connected-component labeling Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labele ...


References

{{DEFAULTSORT:Hoshen-Kopelman algorithm Cluster analysis algorithms