The Jenks optimization method, also called the Jenks natural breaks classification method, is a
data clustering
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other classes. In other words, the method seeks to reduce the
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
within classes and maximize the variance between classes.
[Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186–190.][McMaster, Robert, "In Memoriam: George F. Jenks (1916–1996)". Cartography and Geographic Information Science. 24(1) p.56-59.]
The Jenks optimization method is directly related to
Otsu's Method
In computer vision and image processing, Otsu's method, named after , is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and ...
and
Fisher's Discriminant Analysis.
History
George Frederick Jenks
George Frederick Jenks was a 20th-century American
cartographer
Cartography (; from grc, χάρτης , "papyrus, sheet of paper, map"; and , "write") is the study and practice of making and using maps. Combining science, aesthetics and technique, cartography builds on the premise that reality (or an i ...
. Graduating with his Ph.D. in agricultural geography from
Syracuse University
Syracuse University (informally 'Cuse or SU) is a Private university, private research university in Syracuse, New York. Established in 1870 with roots in the Methodist Episcopal Church, the university has been nonsectarian since 1920. Locate ...
in 1947, Jenks began his career under the tutelage of
Richard Harrison, cartographer for ''
Time
Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, t ...
'' and ''Fortune'' magazine.
[McMaster, Robert and McMaster, Susanna. 2002. “A History of Twentieth-Century American Academic Cartography”, Cartography and Geographic Information Science. 29(3) p.312-315.] He joined the faculty of the
University of Kansas
The University of Kansas (KU) is a public research university with its main campus in Lawrence, Kansas, United States, and several satellite campuses, research and educational centers, medical centers, and classes across the state of Kansas. Tw ...
in 1949 and began to build the cartography program. During his 37-year tenure at KU, Jenks developed the Cartography program into one of three programs renowned for their graduate education in the field; the others being the
University of Wisconsin
A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase ''universitas magistrorum et scholarium'', which ...
and the
University of Washington
The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington.
Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
. Much of his time was spent developing and promoting improved cartographic training techniques and programs. He also spent significant time investigating three-dimensional maps, eye-movement research,
thematic map
A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are no ...
communication, and
geostatistics
Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pet ...
.
[CSUN Cartography Specialty Group]
Winter 1997 Newsletter
Background and development
Jenks was a cartographer by profession. His work with
statistics grew out of a desire to make
choropleth map
A choropleth map () is a type of statistical thematic map that uses pseudocolor, i.e., color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita inco ...
s more visually accurate for the viewer. In his paper, ''The Data Model Concept in Statistical Mapping'', he claims that by visualizing data in a three dimensional model cartographers could devise a “systematic and rational method for preparing choroplethic maps”.
Jenks used the analogy of a “blanket of error” to describe the need to use elements other than the mean to generalize data. The three dimensional models were created to help Jenks visualize the difference between data classes. His aim was to generalize the data using as few planes as possible and maintain a constant “blanket of error”.
Description of method
The method requires an iterative process. That is, calculations must be repeated using different breaks in the dataset to determine which set of breaks has the smallest in-class
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
. The process is started by dividing the ordered data into classes in some way which may be arbitrary. There are two steps that must be repeated:
#Calculate the sum of squared deviations from the class means (SDCM).
#Choose a new way of dividing the data into classes, perhaps by moving one or more data points from one class to a different one.
New class deviations are then calculated, and the process is repeated until the sum of the within class deviations reaches a minimal value.
[ESRI FAQ]
What is the Jenks Optimization method
.
Alternatively, all break combinations may be examined, SDCM calculated for each combination, and the combination with the lowest SDCM selected. Since all break combinations are examined, this guarantees that the one with the lowest SDCM is found.
Finally the sum of squared deviations from the mean of the complete data set(SDAM), and the goodness of variance fit (GVF) may be calculated.
GVF is defined as (SDAM - SDCM) / SDAM. GVF ranges from 0 (worst fit) to 1 (perfect fit).
Use in cartography
Jenks’ goal in developing this method was to create a map that was absolutely accurate, in terms of the representation of data's spatial attributes. By following this process, Jenks claims, the “blanket of error” can be uniformly distributed across the mapped surface. He developed this with the intention of using relatively few data classes, less than seven, because that was the limit when using monochromatic shading on a choroplethic map.

The Jenks classification method is commonly used in thematic maps, especially choropleth maps, as one of several available classification methods. When making choropleth maps, the Jenks classification method can be advantageous because if there are clusters in the data values, it will identify them. In fact, in current versions of ArcGIS software from Esri, Jenks is the default classification method. However, the Jenks classification is not recommended for data that have a low variance. The Jenks natural breaks in the data are used to provide a more meaningful visualization of map data based on the "natural breaks' in the data identified by the iterative process.
Alternative methods
Other methods of data classification include
Head/tail Breaks, Natural Breaks (without Jenks Optimization), Equal Interval, Quantile, and Standard Deviation.
See also
*
k-means clustering
''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers o ...
, a generalization for multivariate data (Jenks natural breaks optimization seems to be one dimensional k-means
).
References
{{Reflist
External links
* Volunteered Geographic Information, Daniel Lewis
Jenks Natural Breaks Algorithm with an implementation in python* Object Vision wiki
Fisher's Natural Breaks Classification, a O(k*n*log(n)) algorithm
Data management
Cartography
Cluster analysis algorithms