Spatial descriptive statistics is the intersection of

spatial statistics Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early develo ...

and

descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and a ...

; these methods are used for a variety of purposes in geography, particularly in quantitative data analyses involving Geographic Information Systems (GIS).

Types of spatial data

The simplest forms of spatial data are ''gridded data'', in which a scalar quantity is measured for each point in a regular grid of points, and ''point sets'', in which a set of coordinates (e.g. of points in the plane) is observed. An example of gridded data would be a satellite image of forest density that has been digitized on a grid. An example of a point set would be the latitude/longitude coordinates of all elm trees in a particular plot of land. More complicated forms of data include marked point sets and spatial time series.

Measures of spatial central tendency

The coordinate-wise mean of a point set is the

centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...

, which solves the same

variational problem The calculus of variations (or Variational Calculus) is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions t ...

in the plane (or higher-dimensional Euclidean space) that the familiar average solves on the real line — that is, the centroid has the smallest possible average squared distance to all points in the set.

Measures of spatial dispersion

Dispersion Dispersion may refer to: Economics and finance * Dispersion (finance), a measure for the statistical distribution of portfolio returns *Price dispersion, a variation in prices across sellers of the same item * Wage dispersion, the amount of variat ...

captures the degree to which points in a point set are separated from each other. For most applications, spatial dispersion should be quantified in a way that is invariant to rotations and reflections. Several simple measures of spatial dispersion for a point set can be defined using the

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...

of the coordinates of the points. The

trace Trace may refer to: Arts and entertainment Music * ''Trace'' (Son Volt album), 1995 * ''Trace'' (Died Pretty album), 1993 * Trace (band), a Dutch progressive rock band * ''The Trace'' (album) Other uses in arts and entertainment * ''Trace'' ...

, the

determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if ...

, and the largest

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denot ...

of the covariance matrix can be used as measures of spatial dispersion. A measure of spatial dispersion that is not based on the covariance matrix is the average distance between nearest neighbors.

Measures of spatial homogeneity

A homogeneous set of points in the plane is a set that is distributed such that approximately the same number of points occurs in any circular region of a given area. A set of points that lacks homogeneity may be ''spatially clustered'' at a certain spatial scale. A simple probability model for spatially homogeneous points is the

Poisson process In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one ...

in the plane with constant intensity function.

Ripley's ''K'' and ''L'' functions

Ripley's ''K'' and ''L'' functions introduced by

Brian D. Ripley Brian David Ripley FRSE (born 29 April 1952) is a British statistician. From 1990, he was professor of applied statistics at the University of Oxford and is also a professorial fellow at St Peter's College. He retired August 2014 due to ill he ...

are closely related descriptive statistics for detecting deviations from spatial homogeneity. The ''K'' function (technically its sample-based estimate) is defined as :

\widehat(t) = \lambda^ \sum_ \frac n,

where ''d''_''ij'' is the Euclidean distance between the ''i''^th and ''j''^th points in a data set of ''n'' points, t is the search radius, λ is the average density of points (generally estimated as ''n''/''A'', where ''A'' is the area of the region containing all points) and ''I'' is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...

(1 if its operand is true, 0 otherwise). In 2 dimensions, if the points are approximately homogeneous,

\widehat(t)

should be approximately equal to π''t''². For data analysis, the variance stabilized Ripley ''K'' function called the ''L'' function is generally used. The sample version of the ''L'' function is defined as :

\widehat(t) = \left( \frac \pi \right)^.

For approximately homogeneous data, the ''L'' function has expected value ''t'' and its variance is approximately constant in ''t''. A common plot is a graph of

t - \widehat(t)

against ''t'', which will approximately follow the horizontal zero-axis with constant dispersion if the data follow a homogeneous Poisson process. Using Ripley's K function you can determine whether points have a random, dispersed or clustered distribution pattern at a certain scale.{{cite journal, last1=Wilschut, first1=L.I., last2=Laudisoit, first2=A., last3=Hughes, first3=N.K., last4=Addink, first4=E.A., last5=de Jong, first5=S.M., last6=Heesterbeek, first6=J.A.P., last7=Reijniers, first7=J., last8=Eagle, first8=S., last9=Dubyanskiy, first9=V.M., last10=Begon, first10=M., title=Spatial distribution patterns of plague hosts: point pattern analysis of the burrows of great gerbils in Kazakhstan, journal=Journal of Biogeography, date=2015, volume=42, issue=7, pages=1281–1292, doi=10.1111/jbi.12534, pmid=26877580, pmc=4737218

References