Geohash is a
public domain
The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired ...
geocode system invented in 2008 by Gustavo Niemeyer
[Evidences at the ]Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, a nonprofit based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" and see ...
:
labix.org in 2008, the G. Niemeyer's blog announcing Geohash
*
an article about Geohash witnessing and citing G. Niemeyer works, before 2012
G. Niemeyer's post at forums.geocaching.com announcing Geohash in Feb. 2008
which encodes a geographic location into a short string of letters and digits. Similar ideas were introduced by G.M. Morton in 1966.
[G. M. Morton (196]
''"A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing"''
. Report in IBM Canada. It is a hierarchical spatial data structure which subdivides space into buckets of
grid shape, which is one of the many applications of what is known as a
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
, and generally
space-filling curves
In mathematical analysis, a space-filling curve is a curve whose range contains the entire 2-dimensional unit square (or more generally an ''n''-dimensional unit hypercube). Because Giuseppe Peano (1858–1932) was the first to discover one, spa ...
.
Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). Geohashing guarantees that the longer a shared prefix between two geohashes is, the spatially closer they are together. The reverse of this is not guaranteed, as two points can be very close but have a short or no shared prefix.
History
The core part of the Geohash algorithm and the first initiative to similar solution was documented in a report of G.M. Morton in 1966, "A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing".
The Morton work was used for efficient implementations of
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
, like i
this modern (2014) Geohash-integer version(based on directly interleaving
64-bit integers), but his
geocode
A geocode is a code that represents a geographic entity ( location or object). It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the ''geocode'' is a human-readable an ...
proposal was not
human-readable
A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans.
In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In m ...
and was not popular.
Apparently, in the late 2000s, G. Niemeyer still didn't know about Morton's work, and reinvented it, adding the use of
base32 representation. In February 2008, together with the announcement of the system,
he launched the website
http://geohash.org
, which allows users to convert geographic coordinates to short
URLs which uniquely identify positions on the
Earth
Earth is the third planet from the Sun and the only astronomical object known to harbor life. While large volumes of water can be found throughout the Solar System, only Earth sustains liquid surface water. About 71% of Earth's sur ...
, so that referencing them in
email
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
s,
forums, and
website
A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and W ...
s is more convenient.
Many variations have been developed, including
OpenStreetMap
OpenStreetMap (OSM) is a free, open geographic database updated and maintained by a community of volunteers via open collaboration. Contributors collect data from surveys, trace from aerial imagery and also import from other freely licensed g ...
's ''short link''
[
The ]OpenStreetMap
OpenStreetMap (OSM) is a free, open geographic database updated and maintained by a community of volunteers via open collaboration. Contributors collect data from surveys, trace from aerial imagery and also import from other freely licensed g ...
's ''short link''
documented in wiki.openstreetmap.org
was released
in 2009
is near the same source-code
10 years after
It is strongly based on Morton's interlace algorithm.
(using
base64 instead of base32) in 2009, the ''64-bit Geohash'' in 2014, the exotic ''Hilbert-Geohash'' in 2016, and others.
Typical and main usages
To obtain the Geohash, the user provides an address to be
geocode
A geocode is a code that represents a geographic entity ( location or object). It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the ''geocode'' is a human-readable an ...
d, or
latitude and longitude coordinates, in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.
Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a
GPX file, or transfer the waypoint directly to certain
GPS receivers. Links are also provided to external sites that may provide further details around the specified
location.
For example, the coordinate pair
57.64911,10.40744
(near the tip of the
peninsula
A peninsula (; ) is a landform that extends from a mainland and is surrounded by water on most, but not all of its borders. A peninsula is also sometimes defined as a piece of land bordered by water on three of its sides. Peninsulas exist on a ...
of
Jutland, Denmark
Jutland ( da, Jylland ; german: Jütland ; ang, Ēota land ), known anciently as the Cimbric or Cimbrian Peninsula ( la, Cimbricus Chersonesus; da, den Kimbriske Halvø, links=no or ; german: Kimbrische Halbinsel, links=no), is a peninsula of ...
) produces a slightly shorter hash of
u4pruydqqvj
.
The main usages of Geohashes are:
* As a unique identifier.
* To represent point data, e.g. in databases.
Geohashes have also been proposed to be used for
geotagging.
When used in a database, the structure of geohashed data has two advantages. First, data indexed by geohash will have all points for a given rectangular area in contiguous slices (the number of slices depends on the precision required and the presence of geohash "fault lines"). This is especially useful in database systems where queries on a single index are much easier or faster than multiple-index queries. Second, this index structure can be used for a quick-and-dirty proximity search: the closest points are often among the closest geohashes.
Technical description
A formal description for Computational and Mathematical views.
Textual representation
For exact latitude and longitude translations Geohash is a ''spatial index'' of
base 4, because it transforms the continuous latitude and longitude space coordinates into a hierarchical discrete grid, using a recurrent four-partition of the space. To be a compact code it uses
base 32
Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits (25). One way to represent Base32 numbers in a human-readable way is by using a standard 32-character set, such as the twenty-two upper ...
and represents its values by the following alphabet, that is the "standard textual representation".
The "Geohash alphabet" (32ghs) uses all digits 0-9 and almost all lower case letters except "a", "i", "l" and "o".
For example, using the table above and the constant
, the Geohash
ezs42
can be converted to a decimal representation by ordinary
positional notation
Positional notation (or place-value notation, or positional numeral system) usually denotes the extension to any base of the Hindu–Arabic numeral system (or decimal system). More generally, a positional system is a numeral system in which th ...
:
:
ezs42">code>ezs42sub>32ghs =
: =
: =
: =
=
Geometrical representation
The geometry of the Geohash has a mixed spatial representation:
* Geohashes with 2, 4, 6, ... ''e'' digits (
even
Even may refer to:
General
* Even (given name), a Norwegian male personal name
* Even (surname)
* Even (people), an ethnic group from Siberia and Russian Far East
**Even language, a language spoken by the Evens
* Odd and Even, a solitaire game wh ...
digits) are represented by
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
in a "regular grid" where decoded pair (latitude, longitude) has uniform uncertainty, valid as
Geo URI.
* Geohashes with 1, 3, 5, ... ''d'' digits (odd digits) are represented by "И-order curve". Latitude and longitude of the decoded pair has different uncertainty (longitude is truncated).
It is possible to build the "И-order curve" from the Z-order curve by merging neighboring cells and indexing the resulting rectangular grid by the function
. The illustration shows how to obtain the grid of 32 rectangular cells from the grid of 64 square cells.
The most important property of Geohash for humans is that it preserves ''spatial hierarchy'' in the ''code prefixes''.
For example, in the "1 Geohash digit grid" illustration of 32 rectangles, above, the spatial region of the code
e
(rectangle of greyish blue circle at position 4,3) is preserved with prefix
e
in the "2 digit grid" of 1024 rectangles (scale showing
em
and greyish green to blue circles at grid).
Algorithm and example
Using the hash
ezs42
as an example, here is how it is decoded into a decimal latitude and longitude. The first step is decoding it from textual "
base 32
Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits (25). One way to represent Base32 numbers in a human-readable way is by using a standard 32-character set, such as the twenty-two upper ...
ghs", as showed above, to obtain the binary representation:
:
:
:
:
:
.
This operation results in the
bits
01101
11111
11000
00100
00010
. Starting to count from the left side with the digit 0 in the first position, the digits in the odd positions form the longitude code (
0111110000000
), while the digits in the even positions form the latitude code (
101111001001
).
Each binary code is then used in a series of divisions, considering one bit at a time, again from the left to the right side. For the latitude value, the interval -90 to +90 is divided by 2, producing two intervals: -90 to 0, and 0 to +90. Since the first bit is 1, the higher interval is chosen, and becomes the current interval. The procedure is repeated for all bits in the code. Finally, the latitude value is the center of the resulting interval. Longitudes are processed in an equivalent way, keeping in mind that the initial interval is -180 to +180.
For example, in the latitude code
101111001001
, the first bit is 1, so we know our latitude is somewhere between 0 and 90. Without any more bits, we'd guess the latitude was 45, giving us an error of ±45. Since more bits are available, we can continue with the next bit, and each subsequent bit halves this error. This table shows the effect of each bit. At each stage, the relevant half of the range is highlighted in green; a low bit selects the lower range, a high bit selects the upper range.
The column "mean value" shows the latitude, simply the mean value of the range. Each subsequent bit makes this value more precise.
(The numbers in the above table have been rounded to 3 decimal places for clarity)
Final rounding should be done carefully in a way that
:
So while rounding 42.605 to 42.61 or 42.6 is correct, rounding to 43 is not.
Digits and precision in km
Limitations when used for deciding proximity
Edge cases
Geohashes can be used to find points in proximity to each other based on a common prefix. However,
edge case
An edge case is a problem or situation that occurs only at an extreme (maximum or minimum) operating parameter. For example, a stereo speaker might noticeably distort audio when played at maximum volume, even in the absence of any other extreme ...
locations close to each other but on opposite sides of the 180 degree meridian will result in Geohash codes with no common prefix (different longitudes for near physical locations). Points close to the North and South poles will have very different geohashes (different longitudes for near physical locations).
Two close locations on either side of the Equator (or Greenwich meridian) will not have a long common prefix since they belong to different 'halves' of the world. Put simply, one location's binary latitude (or longitude) will be 011111... and the other 100000...., so they will not have a common prefix and most bits will be flipped. This can also be seen as a consequence of relying on the
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
(which could more appropriately be called an N-order visit in this case) for ordering the points, as two points close by might be visited at very different times. However, two points with a long common prefix will be close by.
In order to do a proximity search, one could compute the southwest corner (low geohash with low latitude and longitude) and northeast corner (high geohash with high latitude and longitude) of a bounding box and search for geohashes between those two. This search will retrieve all points in the z-order curve between the two corners, which can be far too many points. This method also breaks down at the 180 meridians and the poles. Solr uses a filter list of prefixes, by computing the prefixes of the nearest squares close to the geohas
Non-linearity
Since a geohash (in this implementation) is based on
Geographical coordinate system, coordinates of longitude and latitude the distance between two geohashes reflects the distance in latitude/longitude coordinates between two points, which does not translate to actual distance, see
Haversine formula.
Example of non-linearity for latitude-longitude system:
* At the Equator (0 Degrees) the length of a degree of longitude is 111.320 km, while a degree of latitude measures 110.574 km, an error of 0.67%.
* At 30 Degrees (Mid Latitudes) the error is 110.852/96.486 = 14.89%
* At 60 Degrees (High Arctic) the error is 111.412/55.800 = 99.67%, reaching infinity at the poles.
Note that these limitations are not due to geohashing, and not due to latitude-longitude coordinates, but due to the difficulty of mapping coordinates on a sphere (non linear and with wrapping of values, similar to modulo arithmetic) to two dimensional coordinates and the difficulty of exploring a two dimensional space uniformly. The first is related to
Geographical coordinate system
The geographic coordinate system (GCS) is a spherical or ellipsoidal coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the vario ...
and
Map projection
In cartography, map projection is the term used to describe a broad set of transformations employed to represent the two-dimensional curved surface of a globe on a plane. In a map projection, coordinates, often expressed as latitude and l ...
, and the other to
Hilbert curve and
z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
. Once a coordinate system is found that represents points linearly in distance and wraps up at the edges, and can be explored uniformly, applying geohashing to those coordinates will not suffer from the limitations above.
While it is possible to apply geohashing to an area with a
Cartesian coordinate system
A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in ...
, it would then only apply to the area where the coordinate system applies.
Despite those issues, there are possible workarounds, and the algorithm has been successfully used in Elasticsearch, MongoDB, HBase, Redis, and
Accumulo to implement proximity searches.
Similar indexing systems
An alternative to storing Geohashes as strings in a database ar
Locational codes which are also called spatial keys and similar to QuadTiles.
In some
geographical information systems and
Big Data
Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
spatial databases, a
Hilbert curve based indexation can be used as an alternative to
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
, like in the ''S2 Geometry library''.
In 2019 a front-end was designed by
QA Locate in what they called GeohashPhrase to use phrases to code Geohashes for easier communication via spoken English language. There were plans to make GeohashPhrase open source.
*
C-squares
C-squares (acronym for the ''Concise Spatial QUery And REpresentation System'') is a system of spatially unique, location-based identifiers (geocodes) for areas on the surface of the earth, represented as cells from a latitude- and longitude-base ...
(2002)
*
GeoKey (2018, proprietary)
*
Ghana Post GPS
GhanaPostGPS is a web and smartphone application, sponsored by the government of Ghana and developed by Vokacom, to provide a digital addresses and postal codes for every 5 squared meter location in Ghana. The digital address is a composite of th ...
(2017)
*
Maidenhead Locator System (1980)
*
Makaney Code (2011)
*
MapCode (2008)
*
*
Natural Area Code
*
Open Location Code (2014, aka. "plus codes",
Google Maps
Google Maps is a web mapping platform and consumer application offered by Google. It offers satellite imagery, aerial photography, street maps, 360° interactive panoramic views of streets (Street View), real-time traffic conditions, and rou ...
)
*
QRA locator (1959)
*
Universal Transverse Mercator coordinate system
*
what3words
what3words is a proprietary geocode system designed to identify any location with a resolution of about . It is owned by What3words Limited, based in London, England. The system encodes geographic coordinates into three permanently fixed dictio ...
(2013, proprietary)
*
WhatFreeWords
*
GEOREF
__NOTOC__
The GeoRef database is a bibliographic database that indexes scientific literature in the geosciences, including geology. Coverage ranges from 1666 to the present for North American literature, and 1933 to the present for the rest of th ...
(similar 2-digit hierarchy code)
*
Xaddress
*
3Geonames (2018, open source)
Licensing
The Geohash algorithm was put in the
public domain
The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired ...
by its inventor in a public announcement on February 26, 2008.
While comparable algorithms have been successfully patented and
had copyright claimed upon,
GeoHash is based on an entirely different algorithm and approach.
See also
*
List of geodesic-geocoding systems
A discrete global grid (DGG) is a mosaic that covers the entire Earth's surface.
Mathematically it is a space partitioning: it consists of a set of non-empty regions that form a partition of the Earth's surface. In a usual grid-modeling strate ...
*
Geohash-36 (is not a Geohash-variant)
*
Grid (spatial index)
*
Maidenhead Locator System
*
*
Morton number (number theory)
*
Natural Area Code
*
Numbering scheme
There are many different numbering schemes for assigning nominal numbers to entities. These generally require an agreed set of rules, or a central coordinator. The schemes can be considered to be examples of a primary key of a database management s ...
*
Open Location Code (plus code)
*
space-filling curves
In mathematical analysis, a space-filling curve is a curve whose range contains the entire 2-dimensional unit square (or more generally an ''n''-dimensional unit hypercube). Because Giuseppe Peano (1858–1932) was the first to discover one, spa ...
*
what3words
what3words is a proprietary geocode system designed to identify any location with a resolution of about . It is owned by What3words Limited, based in London, England. The system encodes geographic coordinates into three permanently fixed dictio ...
*
Z-order curve
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It ...
References
External links
*
Geohash approximations for JTS geometriesThe Geohash Playground
{{geocoding-systems
Geographic coordinate systems
Geocodes
2008 introductions