HOME

TheInfoList



OR:

The Jenks optimization method, also called the Jenks natural breaks classification method, is a
data clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other classes. In other words, the method seeks to reduce the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
within classes and maximize the variance between classes.Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186–190.McMaster, Robert, "In Memoriam: George F. Jenks (1916–1996)". Cartography and Geographic Information Science. 24(1) p.56-59. The Jenks optimization method is directly related to
Otsu's Method In computer vision and image processing, Otsu's method, named after , is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and ...
and Fisher's Discriminant Analysis.


History


George Frederick Jenks

George Frederick Jenks was a 20th-century American
cartographer Cartography (; from grc, χάρτης , "papyrus, sheet of paper, map"; and , "write") is the study and practice of making and using maps. Combining science, aesthetics and technique, cartography builds on the premise that reality (or an i ...
. Graduating with his Ph.D. in agricultural geography from
Syracuse University Syracuse University (informally 'Cuse or SU) is a Private university, private research university in Syracuse, New York. Established in 1870 with roots in the Methodist Episcopal Church, the university has been nonsectarian since 1920. Locate ...
in 1947, Jenks began his career under the tutelage of Richard Harrison, cartographer for ''
Time Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, t ...
'' and ''Fortune'' magazine.McMaster, Robert and McMaster, Susanna. 2002. “A History of Twentieth-Century American Academic Cartography”, Cartography and Geographic Information Science. 29(3) p.312-315. He joined the faculty of the
University of Kansas The University of Kansas (KU) is a public research university with its main campus in Lawrence, Kansas, United States, and several satellite campuses, research and educational centers, medical centers, and classes across the state of Kansas. Tw ...
in 1949 and began to build the cartography program. During his 37-year tenure at KU, Jenks developed the Cartography program into one of three programs renowned for their graduate education in the field; the others being the
University of Wisconsin A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase ''universitas magistrorum et scholarium'', which ...
and the
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
. Much of his time was spent developing and promoting improved cartographic training techniques and programs. He also spent significant time investigating three-dimensional maps, eye-movement research,
thematic map A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are no ...
communication, and
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pet ...
.CSUN Cartography Specialty Group
Winter 1997 Newsletter


Background and development

Jenks was a cartographer by profession. His work with statistics grew out of a desire to make
choropleth map A choropleth map () is a type of statistical thematic map that uses pseudocolor, i.e., color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita inco ...
s more visually accurate for the viewer. In his paper, ''The Data Model Concept in Statistical Mapping'', he claims that by visualizing data in a three dimensional model cartographers could devise a “systematic and rational method for preparing choroplethic maps”. Jenks used the analogy of a “blanket of error” to describe the need to use elements other than the mean to generalize data. The three dimensional models were created to help Jenks visualize the difference between data classes. His aim was to generalize the data using as few planes as possible and maintain a constant “blanket of error”.


Description of method

The method requires an iterative process. That is, calculations must be repeated using different breaks in the dataset to determine which set of breaks has the smallest in-class
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
. The process is started by dividing the ordered data into classes in some way which may be arbitrary. There are two steps that must be repeated: #Calculate the sum of squared deviations from the class means (SDCM). #Choose a new way of dividing the data into classes, perhaps by moving one or more data points from one class to a different one. New class deviations are then calculated, and the process is repeated until the sum of the within class deviations reaches a minimal value.ESRI FAQ
What is the Jenks Optimization method
.
Alternatively, all break combinations may be examined, SDCM calculated for each combination, and the combination with the lowest SDCM selected. Since all break combinations are examined, this guarantees that the one with the lowest SDCM is found. Finally the sum of squared deviations from the mean of the complete data set(SDAM), and the goodness of variance fit (GVF) may be calculated. GVF is defined as (SDAM - SDCM) / SDAM. GVF ranges from 0 (worst fit) to 1 (perfect fit).


Use in cartography

Jenks’ goal in developing this method was to create a map that was absolutely accurate, in terms of the representation of data's spatial attributes. By following this process, Jenks claims, the “blanket of error” can be uniformly distributed across the mapped surface. He developed this with the intention of using relatively few data classes, less than seven, because that was the limit when using monochromatic shading on a choroplethic map. The Jenks classification method is commonly used in thematic maps, especially choropleth maps, as one of several available classification methods. When making choropleth maps, the Jenks classification method can be advantageous because if there are clusters in the data values, it will identify them. In fact, in current versions of ArcGIS software from Esri, Jenks is the default classification method. However, the Jenks classification is not recommended for data that have a low variance. The Jenks natural breaks in the data are used to provide a more meaningful visualization of map data based on the "natural breaks' in the data identified by the iterative process.


Alternative methods

Other methods of data classification include Head/tail Breaks, Natural Breaks (without Jenks Optimization), Equal Interval, Quantile, and Standard Deviation.


See also

*
k-means clustering ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers o ...
, a generalization for multivariate data (Jenks natural breaks optimization seems to be one dimensional k-means).


References

{{Reflist


External links

* Volunteered Geographic Information, Daniel Lewis
Jenks Natural Breaks Algorithm with an implementation in python
* Object Vision wiki
Fisher's Natural Breaks Classification, a O(k*n*log(n)) algorithm




Data management Cartography Cluster analysis algorithms