HOME

TheInfoList



OR:

A choropleth map () is a type of statistical
thematic map A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are n ...
that uses pseudocolor, i.e.,
color Color (American English) or colour (British English) is the visual perceptual property deriving from the spectrum of light interacting with the photoreceptor cells of the eyes. Color categories and physical specifications of color are assoc ...
corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as
population density Population density (in agriculture: Stock (disambiguation), standing stock or plant density) is a measurement of population per unit land area. It is mostly applied to humans, but sometimes to other living organisms too. It is a key geographical ...
or per-capita income. Choropleth maps provide an easy way to visualize how a variable varies across a geographic area or show the level of variability within a region. A
heat map A heat map (or heatmap) is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is c ...
or
isarithmic map A contour line (also isoline, isopleth, or isarithm) of a function of two variables is a curve along which the function has a constant value, so that the curve joins points of equal value. It is a plane section of the three-dimensional grap ...
is similar but uses regions drawn according to the pattern of the variable, rather than the '' a priori'' geographic areas of choropleth maps. The choropleth is likely the most common type of thematic map because published statistical data (from government or other sources) is generally aggregated into well-known geographic units, such as countries, states, provinces, and counties, and thus they are relatively easy to create using GIS,
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
s, or other software tools.


History

The earliest known choropleth map was created in 1826 by Baron Pierre Charles Dupin, depicting the availability of basic education in France by department. More "''cartes teintées''" ("tinted maps") were soon produced in France to visualize other "moral statistics" on education, disease, crime, and living conditions. Choropleth maps quickly gained popularity in several countries due to the increasing availability of demographic data compiled from national Censuses, starting with a series of choropleth maps published in the official reports of the 1841 Census of Ireland. When
Chromolithography Chromolithography is a method for making multi-colour prints. This type of colour printing stemmed from the process of lithography, and includes all types of lithography that are printed in colour. When chromolithography is used to reproduce ph ...
became widely available after 1850, color was increasingly added to choropleth maps. The term "choropleth map" was introduced in 1938 by the geographer John Kirtland Wright, and was in common usage among cartographers by the 1940s. John Kirtland Wright (1938)
"Problems in Population Mapping" in ''Notes on statistical mapping, with special reference to the mapping of population phenomena''
p.12.
Also in 1938, Glenn Trewartha reintroduced them as "ratio maps," but this term did not survive.


Structure

A choropleth map brings together two datasets: spatial data representing a partition of geographic space into distinct ''districts'', and ''statistical data'' representing a variable aggregated within each district. There are two common conceptual models of how these interact in a choropleth map: in one view, which may be called "district dominant," the districts (often existing governmental units) are the focus, in which a variety of attributes are collected, including the variable being mapped. In the other view, which may be called "variable dominant," the focus is on the variable as a geographic phenomenon (say, the Latino population), with a real-world distribution, and the partitioning of it into districts is merely a convenient measurement technique.


Geometry: aggregation districts

In a choropleth map, the districts are usually previously defined entities such as governmental or administrative units (e.g., counties, provinces, countries), or districts created specifically for statistical aggregation (e.g., census tracts), and thus have no expectation of correlation with the geography of the variable. That is, boundaries of the colored districts may or may not coincide with the location of changes in the geographic distribution being studied. This is in direct contrast to chorochromatic and isarithmic maps, in which region boundaries are defined by patterns in the geographic distribution of the subject phenomenon. Using pre-defined aggregation regions has a number of advantages, including: easier compilation and mapping of the variable (especially in the age of GIS and the Internet with its many sources of data), recognizability of the districts, and the applicability of the information to further inquiry and policy tied to the individual districts. A prime example of this would be elections, in which the vote total for each district determines its elected representative. However, it can result in a number of issues, generally due to the fact that the constant color applied to each aggregation district makes it look homogeneous, masking an unknown degree of variation of the variable within the district. For example, a city may include neighborhoods of low, moderate, and high family income, but be colored with one constant "moderate" color. Thus, real-world patterns may not conform to the regional unit symbolized. Because of this, issues such as the
ecological fallacy An ecological fallacy (also ecological ''inference'' fallacy or population fallacy) is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the g ...
and the
modifiable areal unit problem __NOTOC__ The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. MAUP affects results when point-based measures of spatial phenomena are aggregated into ...
(MAUP) can lead to major misinterpretations of the data depicted, and other techniques are preferable if one can obtain the necessary data. These issues can be somewhat mitigated by using smaller districts, because they show finer variations in the mapped variable, and their smaller visual size and increased number reduces the likelihood that the map user makes judgments about the variation within a single district. However, they can make the map overly complex, especially if there is not a meaningful geographic pattern in the variable (i.e., the map looks like randomly scattered colors). Although representing specific data in large regions can be misleading, the familiar district shapes can make the map clearer and easier to interpret and remember. The choice of regions will ultimately depend on the map's intended audience and purpose. Alternatively, the dasymetric technique can sometimes be employed to refine the region boundaries to more closely match actual changes in the subject phenomenon. Because of these issues, for many variables, one may prefer an isarithmic (for a quantitative variable) or chorochromatic map (for a qualitative variable), in which the region boundaries are based on the data itself. However, in many cases such detailed information is simply not available, and the choropleth map is the only feasible option.


Property: aggregate statistical summaries

The variable to be mapped may come from a wide variety of disciplines in the human or natural world, although human topics (e.g. demographics, economics, agriculture) are generally more common because of the role of governmental units in human activity, which often leads to the original collection of the statistical data. The variable can also be in any of Stevens' levels of measurement: nominal, ordinal, interval, or ratio, although quantitative (interval/ratio) variables are more commonly used in choropleth maps than qualitative (nominal/ordinal) variables. It is important to note that the level of measurement of the individual datum may be different than the aggregate summary statistic. For example, a census may ask each individual for his or her "primary spoken language" (nominal), but this may be summarized over all of the individuals in a county as "percent primarily speaking Spanish" (ratio) or as "predominant primary language" (nominal). Broadly speaking, a choropleth map may represent two types of variables, a distinction common to physics and chemistry as well as Geostatistics and
spatial analysis Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early deve ...
: * A spatially ''extensive'' variable (sometimes called a ''global property'') is one that can apply only to the entire district, commonly in the form of total counts or amounts of a phenomenon (akin to
Mass Mass is an intrinsic property of a body. It was traditionally believed to be related to the quantity of matter in a physical body, until the discovery of the atom and particle physics. It was found that different atoms and different eleme ...
or weight in physics). Extensive variables are said to be ''accumulative'' over space; for example, if the population of the United Kingdom is 65 million, it is not possible that the populations of England, Wales, Scotland, and Northern Ireland could also be 65 million. Instead, their total populations must sum (accumulate) to calculate the total population of the collective entity. However, while it is possible to map an extensive variable in a choropleth map, this is almost universally discouraged because patterns can be easily misinterpreted. For example, if a choropleth map assigned a particular shade of red to total populations between 60 and 70 million, a situation in which United Kingdom (as a single district) has 65 million inhabitants would be indistinguishable from a situation in which the four constituent countries each had 65 million inhabitants, even though these are vastly different geographic realities. Another source of interpretation error is that if a large district and a small district have the same value (and thus the same color), the larger one will naturally look like more. Other types of thematic maps, especially proportional symbols and cartograms, are designed to represent extensive variables and are generally preferred. * A spatially ''intensive'' variable, also known as a '' field'', ''statistical surface'', or ''localized variable'', represents a property that could be measured at any location (a point or small area, depending on its nature) in space, independent of any boundaries, although its variation over a district can be summarized as a single value. Common intensive variables include densities, proportions, rates of change, mean allotments (e.g., GDP per capita), and descriptive statistics (e.g., mean, median, standard deviation). Intensive variables are said to be ''distributive'' over space; for example, if the population ''density'' of the United Kingdom is 250 people per square kilometer, then it would be reasonable to estimate (in the absence of any other data) that the most likely (if not actually correct) density of each of the five constituent countries is also 250/km2. Traditionally in cartography, the predominant conceptual model for this kind of phenomenon has been the ''statistical surface'', in which the variable is imagined as a third-dimension "height" above the two-dimensional space that varies continuously. In Geographic information science, the more common conceptualization is the field, adopted from
Physics Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which r ...
and usually modeled as a scalar function of location. Choropleth maps are better suited to intensive variables than extensive; if a map user sees the United Kingdom filled with a color for "100-200 people per square km," estimating that Wales and England may each have 100-200 people per square km may not be accurate, but it is possible and a reasonable estimate.


Normalization

Normalization is the technique of deriving a spatially intensive variable from one or more spatially extensive variables, so that it can be appropriately used in a choropleth map. It is similar, but not identical, to the technique of normalization or standardization in statistics. Typically, it is accomplished by computing the ratio between two spatially extensive variables.T. Slocum, R. McMaster, F. Kessler, H. Howard (2009). Thematic Cartography and Geovisualization, Third Edn, page 252. Pearson Prentice Hall: Upper Saddle River, NJ. Although any such ratio will result in an intensive variable, only a few are especially meaningful and commonly used in choropleth maps: * ''Density'' = total / area. Example: population density * ''Proportion'' = subgroup total / grand total. Example: Wealthy households as a percentage of all households. * ''Mean allocation'' = total amount / total individuals. Example:
gross domestic product Gross domestic product (GDP) is a monetary measure of the market value of all the final goods and services produced and sold (not resold) in a specific time period by countries. Due to its complex and subjective nature this measure is oft ...
per capita (total GDP / total population) * ''Rate of change'' = total at later time / total at earlier time. Example: annual population growth rate. These are not equivalent, nor is one better than another. Rather, they tell different aspects of a geographic narrative. For example, a choropleth map of the population density of the Latino population in Texas visualizes a narrative about the spatial clustering and distribution of that group, while a map of the percent Latino visualizes a narrative of composition and predominance.Failure to employ a proper normalization will lead to an inappropriate and potentially misleading map.


Classification

Every choropleth map has a strategy for mapping values to colors. A ''classified'' choropleth map separates the range of values into classes, with all of the districts in each class being assigned the same color. An ''unclassed'' map (sometimes called ''n-class'') directly assigns a color proportional to the value of each district. Starting with Dupin's 1826 map, classified choropleth maps have been far more common. It is likely that this was originally due to the greater simplicity of applying a limited set of tints; only in the age of computerized cartography have unclassed choropleth maps even been feasible, and until recently, they were still not easy to create in most mapping software. Waldo R. Tobler, in formally introducing the unclassed scheme in 1973, asserted that it was a more accurate depiction of the original data, and stated that the primary argument in favor of classification, that it is more readable, needed to be tested. The debate and experiments that followed came to the general conclusion that the primary advantage of unclassed choropleth maps, in addition to Tobler's assertion of raw accuracy, was that they allowed readers to see subtle variations in the variable, without leading them to believe that the districts the fell into the same class had identical values. Thus, they are able to better see the general patterns in the geographic phenomenon, but not the specific values. The primary argument in favor of classed choropleth maps is that it is easier for readers to process, due to the fewer number of distinct shades to recognize, which reduces
cognitive load In cognitive psychology, cognitive load refers to the amount of working memory resources used. There are three types of cognitive load: ''intrinsic'' cognitive load is the effort associated with a specific topic; ''extraneous'' cognitive load refe ...
and allows them to precisely match the colors in the map to the values listed in the legend. Classification is performed by establishing a ''classification rule'', a series of thresholds that partitions the quantitative range of variable values into a series of ordered classes. For example, if a dataset of annual Median income by U.S. county includes values between USD$20,000 and $150,000, it could be broken into three classes at thresholds of $45,000 and $83,000. To avoid confusion, any classification rule should be
mutually exclusive In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...
and
collectively exhaustive In probability theory and logic, a set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the events 1, 2, 3, 4, 5, and 6 balls of a single outcome are collect ...
, meaning that any possible value falls into exactly one class. For example, if a rule establishes a threshold at the value 6.5, it needs to be clear about whether a district with a value of exactly 6.5 will be classified into the lower or upper class (i.e., whether the definition of the lower class is <6.5 or ≤6.5 and whether the upper class is >6.5 or ≥6.5). A variety of types of classification rules have been developed for choropleth maps: * '' Exogenous'' rules import thresholds without regard for patterns in the data at hand. ** ''Established'' rules are those already in common use due to past scientific research or official policy. An example would be using government tax brackets or a standard
Poverty threshold The poverty threshold, poverty limit, poverty line or breadline is the minimum level of income deemed adequate in a particular country. The poverty line is usually calculated by estimating the total cost of one year's worth of necessities for t ...
when classifying income levels. ** ''Ad hoc'' or ''Common sense'' strategies are essentially invented by the cartographer using thresholds that have some intuitive sense. An example would be classifying incomes according to what the cartographer believes to be "rich," "middle class," and "poor." These strategies are generally not advised unless all other methods are not feasible. * ''Endogenous'' rules are based on patterns in the dataset itself. ** ''Natural breaks'' rules look for natural clusters in the data, in which large numbers of districts have similar values with large gaps between them. If this is the case, such clusters are probably geographically meaningful. *** The Jenks natural breaks optimization is a
heuristic algorithm In mathematical optimization and computer science, heuristic (from Greek εὑρίσκω "I find, discover") is a technique designed for solving a problem more quickly when classic methods are too slow for finding an approximate solution, or whe ...
for automatically identifying such clusters if they exist; it is essentially a one-dimensional form of the k-means clustering algorithm.Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186–190. If natural clusters do not exist, the breaks it generates are often recognized as a good compromise between the other methods, and it is commonly the default classifier used in GIS software. ** ''Equal intervals'' or an ''
arithmetic progression An arithmetic progression or arithmetic sequence () is a sequence of numbers such that the difference between the consecutive terms is constant. For instance, the sequence 5, 7, 9, 11, 13, 15, . . . is an arithmetic progression with a common differ ...
'' divides the range of values so that each class has an equal range of values: (''max'' - ''min'')/''n''. For example, the income range above ($20,000 - $150,000) would be divided into four classes at $52,500, $85,000, and $117,500. *** A ''standard deviation'' rule also generates equal ranges of value, but rather than starting with the minimum and maximum values, it starts at the arithmetic mean of the data and establishes a break at each multiple of a constant number of standard deviations above and below the mean. ** '' Quantiles'' divides the dataset so each class has an equal number of districts. For example, if the 3,141 counties of the United States were divided into four quantile classes (i.e., quartiles), then the first class would include the 785 poorest counties, then the next 785. Adjustments may need to be made when the number of districts does not divide evenly, or when identical values straddle the threshold. ** A ''
Geometric progression In mathematics, a geometric progression, also known as a geometric sequence, is a sequence of non-zero numbers where each term after the first is found by multiplying the previous one by a fixed, non-zero number called the ''common ratio''. For ex ...
'' rule divides the range of values so the ratio of thresholds is constant (rather than their interval as in an arithmetic progression). For example, the income range above would be divided using a ratio of 2 with thresholds at $40,000 and $80,000. This type of rule is commonly used when the
frequency distribution In statistics, the frequency (or absolute frequency) of an event i is the number n_i of times the observation has occurred/recorded in an experiment or study. These frequencies are often depicted graphically or in tabular form. Types The cumula ...
of the data has a very high positive skew, especially if it is geometric or exponential. ** A ''nested means'' or '' Head/tail Breaks'' rule is an algorithm that recursively divides the data set by setting a threshold at the arithmetic mean, then subdividing each of the two created classes at their respective means, and so on. Thus, the number of classes is not arbitrary, but must be a power of two (2, 4, 8, etc.). It has been suggested that this also works well for highly skewed distributions. Because calculated thresholds can often be at precise values that are not easily interpretable by map readers (e.g., $74,326.9734), it is common to create a ''modified classification rule'' by rounding threshold values to a similar simple number. A common example is a modified geometric progression that subdivides powers of ten, such as , 2.5, 5, 10, 25, 50, 100, ...or , 3, 10, 30, 100, ...


Color progression

The final element of a choropleth map is the set of colors used to represent the different values of the variable. There are a variety of different approaches to this task, but the primary principle is that any order in the variable (e.g., low to high quantitative values) should be reflected in the perceived order of the colors (e.g., light to dark), as this will allow map readers to intuitively make "more vs. less" judgements and see trends and patterns with minimal reference to the legend. A second general guideline, at least for classified maps, is that the colors should be easily distinguishable, so the colors on the map can be unambiguously matched to those in the legend to determine the represented values. This requirement limits the number of classes that can be included; for shades of gray, tests have shown that when value alone is used (e.g., light to dark, whether gray or any single
hue In color theory, hue is one of the main properties (called color appearance parameters) of a color, defined technically in the CIECAM02 model as "the degree to which a stimulus can be described as similar to or different from stimuli that ...
), it is difficult to practically use more than seven classes. If differences in hue and/or saturation are incorporated, that limit increases significantly to as many as 10-12 classes. The need for color discrimination is further impacted by color vision deficiencies; for example, color schemes that use red and green to distinguish values will not be useful for a significant portion of the population. The most common types of color progressions used in choropleth (and other thematic) maps include: * A ''Sequential progression'' represents variable values as color value ** A ''Grayscale progression'' uses only shades of gray. ** A ''Single-hue progression'' fades from a dark shade of the chosen color (or gray) to a very light or white shade of relatively the same hue. This is a common method used to map magnitude. The darkest hue represents the greatest number in the data set and the lightest shade representing the least number. ** A ''Partial-spectral progression'' uses a limited range of hues to add more contrast to the value contrast, enabling a larger number of classes to be used. Yellow is commonly used for the lighter end of the progression due to its natural apparent lightness. Common hue ranges are yellow-green-blue and yellow-orange-red. * A ''Divergent'' or ''Bi-polar progression'' is essentially two sequential color progressions (of the types above) joined together with a common light color or white. They are normally used to represent positive and negative values or divergence from a central tendency, such as the mean of the variable being mapped. For example, a typical progression when mapping temperatures is from dark blue (for cold) to dark red (for hot) with white in the middle. These are often used when the two extremes are given value judgements, such as showing the "good" end as green and the "bad" end as red. * A ''Spectral progression'' uses a wide range of hues (possibly the entire color wheel) without intended differences in value. This is most commonly used when there is an order to the values, but it is not a "more vs. less" order, such as seasonality. It is frequently used by non-cartographers in situations where other color progressions would be much more effective. * A ''Qualitative progression'' uses a scattered set of hues in no particular order, with no intended difference in value. This is most commonly used with nominal categories in a qualitative choropleth map, such as "most prevalent religion."


Bivariate choropleth maps

It is possible to represent two (and sometimes three) variables simultaneously on a single choropleth map by representing each with a single-hue progression and blending the colors of each district. This technique was first published by the U.S. Census Bureau in the 1970s, and has been used many times since, to varying degrees of success. This technique is generally used to visualize the correlation and contrast between two variables hypothesized to be closely related, such as educational attainment and income. Contrasting but not complimentary colors are generally used, so that their combination is intuitively recognized as "between" the two original colors, such as red+blue=purple. The technique works best when the geography of the variable has a high degree of spatial autocorrelation, so that there are large regions of similar colors with gradual changes between them; otherwise the map can look like a confusing mix of random colors. They have been found to be more easily used if the map includes a carefully designed legend and an explanation of the technique.


Legend

A choropleth map uses ''ad hoc'' symbols to represent the mapped variable. While the general strategy may be intuitive if a color progression is chosen that reflects the proper order, map readers cannot decipher the actual value of each district without a legend. A typical choropleth legend for a classed choropleth map includes a series of sample patches of the symbol for each class, with a text description of the corresponding range of values. On an unclassed choropleth map, it is common for the legend to show a smooth color gradient between the minimum and maximum values, with two or more points along it labeled with corresponding values. An alternative approach is the ''histogram legend'', which includes a histogram showing the frequency distribution of the mapped variable (i.e., the number of districts in each class). Each class may be represented by a single bar with its width determined by its minimum and maximum threshold values and its height calculated such that the box area is proportional to the number of districts included, then colored with the map symbol used for that class. Alternatively, the histogram may be divided into a large number of bars, such that each class includes one or more bars, symbolized according to its symbol in the map. This form of legend shows not only the threshold values for each class, but gives some context for the source of those values, especially for endogenous classification rules that are based on the frequency distribution, such as quantiles. However, they are not currently supported in GIS and mapping software, and must typically be constructed manually.


See also

*
Cartogram A cartogram (also called a value-area map or an anamorphic map, the latter common among German-speakers) is a thematic map of a set of features (countries, provinces, etc.), in which their geographic size is altered to be directly proportiona ...
s, which are often colored as choropleths. * Chorochromatic map *
Dasymetric map A dasymetric map () is a type of thematic map that uses areal symbols to visualize a geographic field by refining a choropleth map with ancillary information about the distribution of the variable. The name refers to the fact that the most co ...
* Dot distribution map *
Heat map A heat map (or heatmap) is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is c ...
* MacChoro *
Proportional symbol map A proportional symbol map or proportional point symbol map is a type of thematic map that uses map symbols that vary in size to represent a quantitative variable. For example, circles may be used to show the location of cities within the map, wit ...


References


Further reading

*


External links


ColorBrewer – color advice for cartography

A choropleth map generator for the US
{{authority control Map types Statistical charts and diagrams Color scales