HOME

TheInfoList



OR:

Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an
address An address is a collection of information, presented in a mostly fixed format, used to give the location of a building, apartment, or other structure or a plot of land, generally using border, political boundaries and street names as references, ...
or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface.
Reverse geocoding Reverse geocoding is the process of converting a location as described by geographic coordinates (latitude, longitude) to a human-readable address or place name. It is the opposite of forward geocoding (often referred to as address geocoding or ...
, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries. * Geocode (''verb''): provide geographical coordinates corresponding to (a location). * Geocode (''noun''): is a
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
that represents a geographic entity ( location or object).
In general is a human-readable and short identifier; like a nominal-geocode as
ISO 3166-1 alpha-2 ISO 3166-1 alpha-2 codes are two-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard published by the International Organization for Standardization (ISO), to represent countries, dependent territories, and special ...
, or a grid-geocode, as Geohash geocode. * Geocoder (''noun''): a piece of software or a (web) service that implements a geocoding process i.e. a set of interrelated components in the form of operations,
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s, and data sources that work together to produce a spatial representation for descriptive locational references. The geographic coordinates representing locations often vary greatly in positional accuracy. Examples include building
centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the figure. The same definition extends to any object in n-d ...
s, land parcel centroids, interpolated locations based on thoroughfare ranges, street segments centroids, postal code centroids (e.g. ZIP codes, CEDEX), and
Administrative division Administrative divisions (also administrative units, administrative regions, subnational entities, or constituent states, as well as many similar generic terms) are geographical areas into which a particular independent sovereign state is divi ...
Centroids.


History

Geocoding – a subset of
Geographic Information System A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
(GIS)
spatial analysis Spatial analysis is any of the formal Scientific technique, techniques which study entities using their topological, geometric, or geographic properties, primarily used in Urban design, Urban Design. Spatial analysis includes a variety of techni ...
– has been a subject of interest since the early 1960s.


1960s

In 1960, the first operational GIS – named the Canada Geographic Information System (CGIS) – was invented by Dr. Roger Tomlinson, who has since been acknowledged as the father of GIS. The CGIS was used to store and analyze data collected for the Canada Land Inventory, which mapped information about
agriculture Agriculture encompasses crop and livestock production, aquaculture, and forestry for food and non-food products. Agriculture was a key factor in the rise of sedentary human civilization, whereby farming of domesticated species created ...
,
wildlife Wildlife refers to domestication, undomesticated animals and uncultivated plant species which can exist in their natural habitat, but has come to include all organisms that grow or live wilderness, wild in an area without being species, introdu ...
, and
forestry Forestry is the science and craft of creating, managing, planting, using, conserving and repairing forests and woodlands for associated resources for human and Natural environment, environmental benefits. Forestry is practiced in plantations and ...
at a scale of 1:50,000, in order to regulate land capability for rural Canada. However, the CGIS lasted until the 1990s and was never available commercially. On 1 July 1963, five-digit ZIP codes were introduced nationwide by the United States Post Office Department (USPOD). In 1983, nine-digit ZIP+4 codes were brought about as an extra identifier in more accurately locating addresses. In 1964, the Harvard Laboratory for Computer Graphics and Spatial Analysis developed groundbreaking software code – e.g. GRID, and SYMAP – all of which were sources for commercial development of GIS. In 1967, a team at the Census Bureau – including the mathematician James Corbett and Donald Cooke – invented Dual Independent Map Encoding (DIME) – the first modern vector mapping model – which ciphered address ranges into street network files and incorporated the "percent along" geocoding algorithm. Still in use by platforms such as
Google Maps Google Maps is a web mapping platform and consumer application offered by Google. It offers satellite imagery, aerial photography, street maps, 360° interactive panorama, interactive panoramic views of streets (Google Street View, Street View ...
and MapQuest, the "percent along" algorithm denotes where a matched address is located along a reference feature as a percentage of the reference feature's total length. DIME was intended for the use of the United States Census Bureau, and it involved accurately mapping block faces, digitizing nodes representing street intersections, and forming spatial relationships. New Haven, Connecticut, was the first city on Earth with a geocodable streets network database.


1980s

In the late 1970s, two main
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
geocoding platforms were in development: GRASS GIS and MOSS. The early 1980s saw the rise of many more commercial vendors of geocoding software, namely Intergraph, ESRI, CARIS, ERDAS, and
MapInfo Corporation MapInfo Corporation, initially incorporated as Navigational Technologies Incorporated, was a company that developed location intelligence software. It was headquartered in North Greenbush, New York. Its products included a desktop mapping app ...
. These platforms merged the 1960s approach of separating spatial information with the approach of organizing this spatial information into database structures. In 1986, Mapping Display and Analysis System (MIDAS) became the first desktop geocoding software, designed for the DOS operating system. Geocoding was elevated from the research department into the business world with the acquisition of MIDAS by MapInfo. MapInfo has since been acquired by
Pitney Bowes Pitney Bowes Inc. is an American technology company most known for its postage meters and other mailing equipment, services, and other technologies. The company was founded by Arthur Pitney, who invented the first commercially available postage m ...
, and has pioneered in merging geocoding with business intelligence; allowing location intelligence to provide solutions for the
public In public relations and communication science, publics are groups of individual people, and the public (a.k.a. the general public) is the totality of such groupings. This is a different concept to the sociology, sociological concept of the ''Öf ...
and
private sector The private sector is the part of the economy which is owned by private groups, usually as a means of establishment for profit or non profit, rather than being owned by the government. Employment The private sector employs most of the workfo ...
s.


1990s

The end of the 20th century had seen geocoding become more user-oriented, especially via open-source GIS software. Mapping applications and geospatial data had become more accessible over the Internet. Because the mail-out/mail-back technique was so successful in the 1980 census, the U.S. Bureau of Census was able to put together a large geospatial database, using interpolated street geocoding. This database – along with the Census' nationwide coverage of households – allowed for the birth of
TIGER The tiger (''Panthera tigris'') is a large Felidae, cat and a member of the genus ''Panthera'' native to Asia. It has a powerful, muscular body with a large head and paws, a long tail and orange fur with black, mostly vertical stripes. It is ...
( Topologically Integrated Geographic Encoding and Referencing). Containing address ranges instead of individual addresses, TIGER has since been implemented in nearly all geocoding software platforms used today. By the end of the 1990 census, TIGER "contained a latitude/longitude-coordinate for more than 30 million feature intersections and endpoints and nearly 145 million feature 'shape' points that defined the more than 42 million feature segments that outlined more than 12 million polygons." TIGER was the breakthrough for "big data" geospatial solutions.


2000s

The early 2000s saw the rise of Coding Accuracy Support System (CASS) address standardization. The CASS certification is offered to all software vendors and advertising mailers who want the United States Postal Services (USPS) to assess the quality of their address-standardizing software. The annually renewed CASS certification is based on delivery point codes, ZIP codes, and ZIP+4 codes. Adoption of a CASS certified software by software vendors allows them to receive discounts in bulk mailing and shipping costs. They can benefit from increased accuracy and efficiency in those bulk mailings, after having a certified database. In the early 2000s, geocoding platforms were also able to support multiple datasets. In 2003, geocoding platforms were capable of merging postal codes with street data, updated monthly. This process became known as "conflation". Beginning in 2005, geocoding platforms included parcel-centroid geocoding. Parcel-centroid geocoding allowed for a lot of precision in geocoding an address. For example, parcel-centroid allowed a geocoder to determine the centroid of a specific building or lot of land. Platforms were now also able to determine the elevation of specific parcels. 2005 also saw the introduction of the Assessor's Parcel Number (APN). A jurisdiction's tax assessor was able to assign this number to parcels of real estate. This allowed for proper identification and record-keeping. An APN is important for geocoding an area which is covered by a gas or oil lease, and indexing property tax information provided to the public. In 2006, Reverse Geocoding and reverse APN lookup were introduced to geocoding platforms. This involved geocoding a numerical point location – with a longitude and latitude – to a textual, readable address. 2008 and 2009 saw the growth of interactive, user-oriented geocoding platforms – namely MapQuest, Google Maps, Bing Maps, and Global Positioning Systems (GPS). These platforms were made even more accessible to the public with the simultaneous growth of the mobile industry, specifically smartphones.


2010s

The 2010s saw vendors fully support geocoding and reverse geocoding globally. Cloud-based geocoding application programming interface (API) and on-premises geocoding have allowed for a greater match rate, greater precision, and greater speed. There is now a popularity in the idea of geocoding being able to influence business decisions. This is the integration between the geocoding process and business intelligence. The future of geocoding also involves three-dimensional geocoding, indoor geocoding, and multiple language returns for the geocoding platforms.


Geocoding process

Geocoding is a task which involves multiple datasets and processes, all of which work together. Some of the components are provided by the user, while others are built into the geocoding software.


Input dataset

Input data are the descriptive, textual information (address or building name) which the user wants to turn into numerical, spatial data (latitude and longitude) through the process of geocoding. These are often included in a table with other attributes of the locations. Input data is classified into two categories: ; Relative input data : Relative input data are the textual descriptions of a location which, alone, cannot specify a spatial representation of that location, but is geographically dependent and geographically relative on other locations. An example of a relative geocode is "Across the street from the Empire State Building." The location being sought cannot be determined without identifying the Empire State Building. Geocoding platforms often do not support such relative locations, but advances are being made in this direction. ; Absolute input data : Absolute input data are the textual descriptions of a location which, alone, can output a spatial representation of that location. This data type outputs an absolute known location independently of other locations. For example, USPS ZIP codes; USPS ZIP+4 codes; complete and partial postal addresses; USPS PO boxes; rural routes; cities; counties; intersections; and named places can all be referenced in a data source absolutely. To achieve the greatest accuracy, the geocodes in the input dataset need to be as correct as possible, and formatted in standard ways. Thus, it is common to first go through a process of data cleansing, often called "address scrubbing," to find and correct any errors. This is especially important for databases in which participants enter their own location geocodes, frequently resulting in a variety of forms (e.g., "Pennsylvania," "PA," "Penn.") and misspellings.


Reference dataset

The second necessary dataset specifies the locations of geographic features in a common spatial reference system, usually stored in a GIS file format or spatial database. Examples include a point dataset of buildings, a line dataset of streets, or a polygon dataset of counties. The attributes of these features must include information that will match the geocodes in the input dataset, such as a name, unique id, or standard geocode such as the United States
FIPS code The Federal Information Processing Standards (FIPS) of the United States are a set of publicly announced standards that the National Institute of Standards and Technology (NIST) has developed for use in computer systems of non-military United State ...
s for geographic features. It is common for the reference dataset to include multiple attribute columns of geocodes for flexibility or handling of complex geocodes. For example, a street dataset intended to be used for street address geocoding must include not only the street name, but any directional suffixes or prefixes and the range of address numbers found on each segment.


Geocoder algorithm

The third component is software that matches each geocode in the input dataset to the attributes of a corresponding feature in the reference dataset. Once a match is made, the location of the reference feature can be attached to the input row. These algorithms are of two types: ; Direct match : The geocoder expects each input item to directly correspond to a single entire feature in the reference dataset. For example, a country or zip code, or matching street addresses to building point reference data. This kind of match is similar to a relational table join, except that geocoder algorithms usually incorporate some kind of uncertainty handling to recognize approximate matches (e.g., different capitalization or slight misspellings). ; Interpolated match : The geocode specifies not only a feature, but some location within that feature. The most common (and oldest) example is matching street addresses to street line data. First the geocoder parses the street address into its component parts (street name, number, directional prefix/suffix). The geocoder matches these components to a corresponding street segment with a number range that includes the input value. Then it calculates where the given number falls within the segment's range to estimate a location along the segment. As with the direct match, these algorithms usually have uncertainty handling to handle approximate matches (especially abbreviations such as "E" for "East" and "Dr" for "Drive"). The algorithm is rarely able to perfectly locate all of the input data; mismatches can occur due to misspelled or incomplete input data, imperfect (usually outdated) reference data, or unique regional geocoding systems that the algorithm does not recognize. Many geocoders provide a follow-up stage to manually review and correct suspect matches.


Address interpolation

A simple method of geocoding is address
interpolation In the mathematics, mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one ...
. This method makes use of data from a street
geographic information system A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
where the street network is already mapped within the geographic coordinate space. Each street segment is attributed with address ranges (e.g. house numbers from one segment to the next). Geocoding takes an address, matches it to a street and specific segment (such as a block, in towns that use the "block" convention). Geocoding then interpolates the position of the address, within the range along the segment.


Example

Take for example: '' 742 Evergreen Terrace'' Let's say that this segment (for instance, a block) of Evergreen Terrace runs from 700 to 799. Even-numbered addresses fall on the east side of Evergreen Terrace, with odd-numbered addresses on the west side of the street. 742 Evergreen Terrace would (probably) be located slightly less than halfway up the block, on the east side of the street. A point would be mapped at that location along the street, perhaps offset a distance to the east of the street centerline.


Complicating factors

However, this process is not always as straightforward as in this example. Difficulties arise when * distinguishing between ambiguous addresses such as 742 Evergreen Terrace and 742 W Evergreen Terrace. * attempting to geocode new addresses for a street that is not yet added to the geographic information system database. While there might be a 742 Evergreen Terrace in Springfield, there might also be a 742 Evergreen Terrace in Shelbyville. Asking for the city name (and state, province, country, etc. as needed) can solve this problem.
Boston Boston is the capital and most populous city in the Commonwealth (U.S. state), Commonwealth of Massachusetts in the United States. The city serves as the cultural and Financial centre, financial center of New England, a region of the Northeas ...
,
Massachusetts Massachusetts ( ; ), officially the Commonwealth of Massachusetts, is a U.S. state, state in the New England region of the Northeastern United States. It borders the Atlantic Ocean and the Gulf of Maine to its east, Connecticut and Rhode ...
has multiple "100 Washington Street" locations because several cities have been annexed without changing street names, thus requiring use of unique
postal code A postal code (also known locally in various English-speaking countries throughout the world as a postcode, post code, PIN or ZIP Code) is a series of letters or numerical digit, digits or both, sometimes including spaces or punctuation, inclu ...
s or district names for disambiguation. Geocoding accuracy can be greatly improved by first utilizing good address verification practices. Address verification will confirm the existence of the address and will eliminate ambiguities. Once the valid address is determined, it is very easy to geocode and determine the latitude/longitude coordinates. Finally, several caveats on using interpolation: * The typical attribution of a street segment assumes that all even numbered parcels are on one side of the segment, and all odd numbered parcels are on the other. This is often not true in real life. * Interpolation assumes that the given parcels are evenly distributed along the length of the segment. This is almost never true in real life; it is not uncommon for a geocoded address to be off by several thousand feet. * Interpolation also assumes that the street is straight. If a street is curved then the geocoded location will not necessarily fit the physical location of the address. * Segment Information (esp. from sources such as
TIGER The tiger (''Panthera tigris'') is a large Felidae, cat and a member of the genus ''Panthera'' native to Asia. It has a powerful, muscular body with a large head and paws, a long tail and orange fur with black, mostly vertical stripes. It is ...
) includes a maximum upper bound for addresses and is interpolated as though the full address range is used. For example, a segment (block) might have a listed range of 100–199, but the last address at the end of the block is 110. In this case, address 110 would be geocoded to 10% of the distance down the segment rather than near the end. * Most interpolation implementations will produce a point as their resulting address location. In reality, the physical address is distributed along the length of the segment, i.e. consider geocoding the address of a
shopping mall A shopping mall (or simply mall) is a large indoor shopping center, usually Anchor tenant, anchored by department stores. The term ''mall'' originally meant pedestrian zone, a pedestrian promenade with shops along it, but in the late 1960s, i ...
– the physical lot may run a distance along the street segment (or could be thought of as a two-dimensional space-filling polygon which may front on several different streets — or worse, for cities with multi-level streets, a three-dimensional shape that meets different streets at several different levels) but the interpolation treats it as a singularity. A very common error is to believe the accuracy ratings of a given map's geocodable attributes. Such accuracy as quoted by vendors has no bearing on an address being attributed to the correct segment or to the correct side of the segment, nor resulting in an accurate position along that correct segment. With the geocoding process used for U.S. census TIGER datasets, 5–7.5% of the addresses may be allocated to a different census tract, while a study of Australia's TIGER-like system found that 50% of the geocoded points were mapped to the wrong property parcel. The accuracy of geocoded data can also have a bearing on the quality of research that uses this data. One study by a group of Iowa researchers found that the common method of geocoding using TIGER datasets as described above, can cause a loss of as much as 40% of the power of a statistical analysis. An alternative is to use orthophoto or image coded data such as the Address Point data from
Ordnance Survey The Ordnance Survey (OS) is the national mapping agency for Great Britain. The agency's name indicates its original military purpose (see Artillery, ordnance and surveying), which was to map Scotland in the wake of the Jacobite rising of ...
in the UK, but such datasets are generally expensive. Because of this, it is quite important to avoid using interpolated results except for non-critical applications. Interpolated geocoding is usually not appropriate for making authoritative decisions, for example if life safety will be affected by that decision. Emergency services, for example, do not make an authoritative decision based on their interpolations; an ambulance or fire truck will always be dispatched regardless of what the map says.


Other techniques

In rural areas or other places lacking high quality street network data and addressing, GPS is useful for mapping a location. For traffic accidents, geocoding to a street intersection or midpoint along a street centerline is a suitable technique. Most highways in developed countries have mile markers to aid in emergency response, maintenance, and navigation. It is also possible to use a combination of these geocoding techniques — using a particular technique for certain cases and situations and other techniques for other cases. In contrast to geocoding of structured postal address records, toponym resolution maps place names in unstructured document collections to their corresponding spatial footprints. * Place codes offer a way to create digitally generated addresses where no information exists using satellite imagery and machine learning, e.g.
Robocodes
* Natural Address Codes are a proprietary geocode system that can address an area anywhere on the Earth, or a volume of space anywhere around the Earth. The use of alphanumeric characters instead of only ten digits makes a NAC shorter than its numerical latitude/longitude equivalent. * Military Grid Reference System is the geocoordinate standard used by NATO militaries for locating points on Earth. * Universal Transverse Mercator coordinate system is a map projection system for assigning coordinates to locations on the surface of the Earth. * the Maidenhead Locator System, popular with radio operators. * the World Geographic Reference System (GEOREF), developed for global military operations, replaced by the current Global Area Reference System (GARS). * Open Location Code or "Plus Codes," developed by Google and released into the public domain. * Geohash, a public domain system based on the Morton Z-order curve. * What3words, a proprietary system that encodes GCS coordinates as pseudorandom sets of words by dividing the coordinates into three numbers and looking up words in an indexed dictionary.


Research

Research has introduced a new approach to the control and knowledge aspects of geocoding, by using an agent-based paradigm. In addition to the new paradigm for geocoding, additional correction techniques and control algorithms have been developed. The approach represents the geographic elements commonly found in addresses as individual agents. This provides a commonality and duality to control and geographic representation. In addition to scientific publication, the new approach and subsequent prototype gained national media coverage in Australia. The research was conducted at Curtin University in Perth, Western Australia. With the recent advance in Deep Learning and Computer Vision, a new geocoding workflow, which leverages Object Detection techniques to directly extract the centroid of the building rooftops as geocoding output, has been proposed.


Uses

Geocoded locations are useful in many GIS analysis, cartography, decision making workflow, transaction mash-up, or injected into larger business processes. On the web, geocoding is used in services like routing and local search. Geocoding, along with GPS provides location data for geotagging media, such as photographs or RSS items.


Privacy concerns

The proliferation and ease of access to geocoding (and
reverse geocoding Reverse geocoding is the process of converting a location as described by geographic coordinates (latitude, longitude) to a human-readable address or place name. It is the opposite of forward geocoding (often referred to as address geocoding or ...
) services raises privacy concerns. For example, in mapping crime incidents, law enforcement agencies aim to balance the privacy rights of victims and offenders, with the public's right to know. Law enforcement agencies have experimented with alternative geocoding techniques that allow them to mask a portion of the locational detail (e.g., address specifics that would lead to identifying a victim or offender). As well, in providing online crime mapping to the public, they also place disclaimers regarding the locational accuracy of points on the map, acknowledging these location masking techniques, and impose terms of use for the information.


See also

* Azure Maps, a leading commercial geocoding service * Geocode *
Gazetteer A gazetteer is a geographical dictionary or wikt:directory, directory used in conjunction with a map or atlas.Aurousseau, 61. It typically contains information concerning the geographical makeup, social statistics and physical features of a co ...
* Geocoded photo, which includes methods of geocoding images *
Geographic information system A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
(GIS) * Geolocation * Geoparsing * Georeference * Geotagging * Linear referencing *
Reverse geocoding Reverse geocoding is the process of converting a location as described by geographic coordinates (latitude, longitude) to a human-readable address or place name. It is the opposite of forward geocoding (often referred to as address geocoding or ...
* Toponym resolution


References


External links


Three Standard Geocoding Methods
(in North America) – article
The Evolution of Geocoding: Moving Away from Conflation Confliction to Best Match
– article
A Flexible Addressing System for Approximate Geocoding
– paper presented at Geoinfo 2003
The UCDP and AidData codebook on geo-referencing aid
– guide for geocoding development aid projects {{Geocoding-systems * Geographic information systems