Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

that deals with directions (

unit vector In mathematics, a unit vector in a normed vector space is a Vector (mathematics and physics), vector (often a vector (geometry), spatial vector) of Norm (mathematics), length 1. A unit vector is often denoted by a lowercase letter with a circumfle ...

s in

Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...

, R^''n''),

axes Axes, plural of ''axe'' and of ''axis'', may refer to * ''Axes'' (album), a 2005 rock album by the British band Electrelane * a possibly still empty plot (graphics) See also * Axis (disambiguation) An axis (: axes) may refer to: Mathematics ...

( lines through the origin in R^''n'') or rotations in R^''n''. More generally, directional statistics deals with observations on compact

Riemannian manifold In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...

s including the

Stiefel manifold In mathematics, the Stiefel manifold V_k(\R^n) is the set of all orthonormal ''k''-frames in \R^n. That is, it is the set of ordered orthonormal ''k''-tuples of vectors in \R^n. It is named after Swiss mathematician Eduard Stiefel. Likewise one ...

The fact that 0 degrees and 360 degrees are identical

angle In Euclidean geometry, an angle can refer to a number of concepts relating to the intersection of two straight Line (geometry), lines at a Point (geometry), point. Formally, an angle is a figure lying in a Euclidean plane, plane formed by two R ...

s, so that for example 180 degrees is not a sensible

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving temporal periods (e.g. time of day, week, month, year, etc.), compass directions, dihedral angles in molecules, orientations, rotations and so on.

Circular distributions

Any

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

(pdf)

\ p(x)

on the line can be "wrapped" around the circumference of a circle of unit radius. That is, the pdf of the wrapped variable

\theta = x_w=x \bmod 2\pi\ \ \in (-\pi,\pi]

p_w(\theta) = \sum_^.

This concept can be extended to the multivariate context by an extension of the simple sum to a number of

F

sums that cover all dimensions in the feature space:

p_w(\boldsymbol\theta) = \sum_^ \cdots \sum_^\infty

where

\mathbf_k = (0, \dots, 0, 1, 0, \dots, 0)^

is the

k

-th Euclidean basis vector. The following sections show some relevant circular distributions.

von Mises circular distribution

The ''von Mises distribution'' is a circular distribution which, like any other circular distribution, may be thought of as a wrapping of a certain linear probability distribution around the circle. The underlying linear probability distribution for the von Mises distribution is mathematically intractable; however, for statistical purposes, there is no need to deal with the underlying linear distribution. The usefulness of the von Mises distribution is twofold: it is the most mathematically tractable of all circular distributions, allowing simpler statistical analysis, and it is a close approximation to the wrapped normal distribution, which, analogously to the linear normal distribution, is important because it is the limiting case for the sum of a large number of small angular deviations. In fact, the von Mises distribution is often known as the "circular normal" distribution because of its ease of use and its close relationship to the wrapped normal distribution. The pdf of the von Mises distribution is:

f(\theta;\mu,\kappa) = \frac

where

I_0

is the modified

Bessel function Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary complex ...

of order 0.

Circular uniform distribution

The probability density function (pdf) of the ''circular uniform distribution'' is given by

U(\theta) = \frac 1 .

It can also be thought of as

\kappa = 0

of the von Mises above.

Wrapped normal distribution

The pdf of the ''wrapped normal distribution'' (WN) is:

= \frac\vartheta\left(\frac,\frac\right)

where μ and σ are the mean and standard deviation of the unwrapped distribution, respectively and

\vartheta(\theta,\tau)

is the

Jacobi theta function In mathematics, theta functions are special functions of several complex variables. They show up in many topics, including Abelian varieties, moduli spaces, quadratic forms, and solitons. Theta functions are parametrized by points in a tube do ...

\vartheta(\theta,\tau) = \sum_^\infty (w^2)^n q^

where

w \equiv e^

and

q \equiv e^.

Wrapped Cauchy distribution

The pdf of the ''wrapped Cauchy distribution'' (WC) is:

WC(\theta;\theta_0,\gamma) = \sum_^\infty \frac
= \frac\,\,\frac

where

\gamma

is the scale factor and

\theta_0

is the peak position.

Wrapped Lévy distribution

The pdf of the ''wrapped Lévy distribution'' (WL) is:

f_(\theta;\mu,c) = \sum_^\infty \sqrt\,\frac

where the value of the summand is taken to be zero when

\theta+2\pi n-\mu \le 0

c

is the scale factor and

\mu

is the location parameter.

Projected normal distribution

The projected normal distribution is a circular distribution representing the direction of a random variable with multivariate normal distribution, obtained by radial projection of the variable over the unit (n-1)-sphere. Due to this, and unlike other commonly used circular distributions, it is not symmetric nor unimodal.

Distributions on higher-dimensional manifolds

Point sets from Kent distributions mapped onto a sphere - journal

There also exist distributions on the two-dimensional sphere (such as the Kent distribution), the ''N''-dimensional sphere (the von Mises–Fisher distribution) or the

torus In geometry, a torus (: tori or toruses) is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanarity, coplanar with the circle. The main types of toruses inclu ...

(the bivariate von Mises distribution). The matrix von Mises–Fisher distribution is a distribution on the

, and can be used to construct probability distributions over rotation matrices. The Bingham distribution is a distribution over axes in ''N'' dimensions, or equivalently, over points on the (''N'' − 1)-dimensional sphere with the antipodes identified. For example, if ''N'' = 2, the axes are undirected lines through the origin in the plane. In this case, each axis cuts the unit circle in the plane (which is the one-dimensional sphere) at two points that are each other's antipodes. For ''N'' = 4, the Bingham distribution is a distribution over the space of unit

quaternions In mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. The algebra of quaternion ...

(

versor In mathematics, a versor is a quaternion of Quaternion#Norm, norm one, also known as a unit quaternion. Each versor has the form :u = \exp(a\mathbf) = \cos a + \mathbf \sin a, \quad \mathbf^2 = -1, \quad a \in ,\pi where the r2 = −1 conditi ...

s). Since a versor corresponds to a rotation matrix, the Bingham distribution for ''N'' = 4 can be used to construct probability distributions over the space of rotations, just like the Matrix-von Mises–Fisher distribution. These distributions are for example used in

geology Geology (). is a branch of natural science concerned with the Earth and other astronomical objects, the rocks of which they are composed, and the processes by which they change over time. Modern geology significantly overlaps all other Earth ...

crystallography Crystallography is the branch of science devoted to the study of molecular and crystalline structure and properties. The word ''crystallography'' is derived from the Ancient Greek word (; "clear ice, rock-crystal"), and (; "to write"). In J ...

and

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

Moments

The raw vector (or trigonometric) moments of a circular distribution are defined as :

m_n=\operatorname E(z^n)=\int_\Gamma P(\theta) z^n \, d\theta

where

\Gamma

is any interval of length

2\pi

P(\theta)

is the

PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...

of the circular distribution, and

z=e^

. Since the integral

P(\theta)

is unity, and the integration interval is finite, it follows that the moments of any circular distribution are always finite and well defined. Sample moments are analogously defined: :

\overline_n=\frac\sum_^N z_i^n.

The population resultant vector, length, and mean angle are defined in analogy with the corresponding sample parameters. :

\rho=m_1

R=, m_1,

\theta_n=\operatorname(m_n).

In addition, the lengths of the higher moments are defined as: :

R_n=, m_n,

while the angular parts of the higher moments are just

(n \theta_n) \bmod 2\pi

. The lengths of all moments will lie between 0 and 1.

Measures of location and spread

Various measures of

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

and

statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartil ...

may be defined for both the population and a sample drawn from that population.

Central tendency

The most common measure of location is the circular mean. The population circular mean is simply the first moment of the distribution while the sample mean is the first moment of the sample. The sample mean will serve as an unbiased estimator of the population mean. When data is concentrated, the

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

and mode may be defined by analogy to the linear case, but for more dispersed or multi-modal data, these concepts are not useful.

Dispersion

The most common measures of circular spread are: * The . For the sample the circular variance is defined as:

\overline = 1 - \overline

and for the population

\operatorname(z) = 1 - R

Both will have values between 0 and 1. * The

S(z) = \sqrt = \sqrt

\overline(z) = \sqrt = \sqrt

with values between 0 and infinity. This definition of the standard deviation (rather than the square root of the variance) is useful because for a wrapped normal distribution, it is an estimator of the standard deviation of the underlying normal distribution. It will therefore allow the circular distribution to be standardized as in the linear case, for small values of the standard deviation. This also applies to the von Mises distribution which closely approximates the wrapped normal distribution. Note that for small

S(z)

, we have

S(z)^2 = 2 \operatorname(z)

. * The

\delta = \frac

\overline=\frac

with values between 0 and infinity. This measure of spread is found useful in the statistical analysis of variance.

Distribution of the mean

Given a set of ''N'' measurements

z_n=e^

the mean value of ''z'' is defined as: :

\overline=\frac\sum_^N z_n

which may be expressed as :

\overline = \overline+i\overline

where :

\overline = \frac\sum_^N \cos(\theta_n) \text \overline = \frac\sum_^N \sin(\theta_n)

or, alternatively as: :

\overline = \overlinee^

where :

\overline = \sqrt \text \overline = \arctan (\overline / \overline).

The distribution of the mean angle (

\overline

) for a circular pdf ''P''(''θ'') will be given by: :

P(\overline,\overline) \, d\overline \, d\overline =
P(\overline,\overline) \, d\overline \, d\overline = 
\int_\Gamma \cdots \int_\Gamma \prod_^N \left P(\theta_n) \, d\theta_n \right

where

\Gamma

is over any interval of length

2\pi

and the integral is subject to the constraint that

\overline

and

\overline

are constant, or, alternatively, that

\overline

and

\overline

are constant. The calculation of the distribution of the mean for most circular distributions is not analytically possible, and in order to carry out an analysis of variance, numerical or mathematical approximations are needed. The

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

may be applied to the distribution of the sample means. (main article: Central limit theorem for directional statistics). It can be shown that the distribution of

overline,\overline /math> approaches a

bivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...

in the limit of large sample size.

Goodness of fit and significance testing

For cyclic data – (e.g., is it uniformly distributed) : * Rayleigh test for a unimodal cluster * Kuiper's test for possibly multimodal data.

References

Books on directional statistics

* * * * * * {{ProbDistributions, directional Statistical data types Statistical theory Types of probability distributions