Algebraic statistics is the use of
algebra
Algebra is a branch of mathematics that deals with abstract systems, known as algebraic structures, and the manipulation of expressions within those systems. It is a generalization of arithmetic that introduces variables and algebraic ope ...
to advance
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
. Algebra has been useful for
experimental design
The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...
,
parameter estimation, and
hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
.
Traditionally, algebraic statistics has been associated with the design of experiments and
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
(especially
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
). In recent years, the term "algebraic statistics" has been sometimes restricted, sometimes being used to label the use of
algebraic geometry
Algebraic geometry is a branch of mathematics which uses abstract algebraic techniques, mainly from commutative algebra, to solve geometry, geometrical problems. Classically, it studies zero of a function, zeros of multivariate polynomials; th ...
and
commutative algebra
Commutative algebra, first known as ideal theory, is the branch of algebra that studies commutative rings, their ideal (ring theory), ideals, and module (mathematics), modules over such rings. Both algebraic geometry and algebraic number theo ...
in statistics.
The tradition of algebraic statistics
In the past, statisticians have used algebra to advance research in statistics. Some algebraic statistics led to the development of new topics in algebra and combinatorics, such as
association scheme
The theory of association schemes arose in statistics, in the theory of design of experiments, experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatori ...
s.
Design of experiments
For example,
Ronald A. Fisher,
Henry B. Mann, and
Rosemary A. Bailey applied
Abelian group
In mathematics, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is commu ...
s to the
design of experiments
The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...
. Experimental designs were also studied with
affine geometry
In mathematics, affine geometry is what remains of Euclidean geometry when ignoring (mathematicians often say "forgetting") the metric notions of distance and angle.
As the notion of '' parallel lines'' is one of the main properties that is i ...
over
finite fields
In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
and then with the introduction of
association scheme
The theory of association schemes arose in statistics, in the theory of design of experiments, experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatori ...
s by
R. C. Bose.
Orthogonal arrays were introduced by
C. R. Rao also for experimental designs.
Algebraic analysis and abstract statistical inference
Invariant measures on
locally compact group
In mathematics, a locally compact group is a topological group ''G'' for which the underlying topology is locally compact and Hausdorff. Locally compact groups are important because many examples of groups that arise throughout mathematics are lo ...
s have long been used in
statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.
The theory covers approaches to statistical-decision problems and to statistica ...
, particularly in
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
.
Beurling's
factorization theorem and much of the work on (abstract)
harmonic analysis
Harmonic analysis is a branch of mathematics concerned with investigating the connections between a function and its representation in frequency. The frequency representation is found by using the Fourier transform for functions on unbounded do ...
sought better understanding of the
Wold decomposition
Decomposition is the process by which dead organic substances are broken down into simpler organic or inorganic matter such as carbon dioxide, water, simple sugars and mineral salts. The process is a part of the nutrient cycle and is ess ...
of
stationary stochastic processes, which is important in
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
statistics.
Encompassing previous results on probability theory on algebraic structures,
Ulf Grenander
Ulf Grenander (23 July 1923 – 12 May 2016) was a Swedish statistician and professor of applied mathematics at Brown University.
His early research was in probability theory, stochastic processes, time series analysis, and statistical theory (pa ...
developed a theory of "abstract inference". Grenander's abstract inference and his
theory of patterns are useful for
spatial statistics
Spatial statistics is a field of applied statistics dealing with spatial data.
It involves stochastic processes (random fields, point processes), sampling, smoothing and interpolation, regional ( areal unit) and lattice ( gridded) data, poin ...
and
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading barcode, bar coded tags or a ...
; these theories rely on
lattice theory
A lattice is an abstract structure studied in the mathematical subdisciplines of order theory and abstract algebra. It consists of a partially ordered set in which every pair of elements has a unique supremum (also called a least upper bou ...
.
Partially ordered sets and lattices
Partially ordered vector spaces and
vector lattices are used throughout statistical theory.
Garrett Birkhoff
Garrett Birkhoff (January 19, 1911 – November 22, 1996) was an American mathematician. He is best known for his work in lattice theory.
The mathematician George Birkhoff (1884–1944) was his father.
Life
The son of the mathematician Ge ...
metrized the positive cone using
Hilbert's projective metric and proved
Jentsch's theorem using the
contraction mapping In mathematics, a contraction mapping, or contraction or contractor, on a metric space (''M'', ''d'') is a function ''f'' from ''M'' to itself, with the property that there is some real number 0 \leq k < 1 such that for all ''x'' and ...
theorem
In mathematics and formal logic, a theorem is a statement (logic), statement that has been Mathematical proof, proven, or can be proven. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to esta ...
. Birkhoff's results have been used for
maximum entropy estimation
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is d ...
(which can be viewed as
linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear function#As a polynomia ...
in
infinite dimensions) by
Jonathan Borwein and colleagues.
Vector lattices and
conical measures were introduced into
statistical decision theory by
Lucien Le Cam.
Recent work using commutative algebra and algebraic geometry
In recent years, the term "algebraic statistics" has been used more restrictively, to label the use of
algebraic geometry
Algebraic geometry is a branch of mathematics which uses abstract algebraic techniques, mainly from commutative algebra, to solve geometry, geometrical problems. Classically, it studies zero of a function, zeros of multivariate polynomials; th ...
and
commutative algebra
Commutative algebra, first known as ideal theory, is the branch of algebra that studies commutative rings, their ideal (ring theory), ideals, and module (mathematics), modules over such rings. Both algebraic geometry and algebraic number theo ...
to study problems related to
discrete random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...
s with finite state spaces. Commutative algebra and algebraic geometry have applications in statistics because many commonly used classes of discrete random variables can be viewed as
algebraic varieties
Algebraic varieties are the central objects of study in algebraic geometry, a sub-field of mathematics. Classically, an algebraic variety is defined as the set of solutions of a system of polynomial equations over the real or complex numbers. ...
.
Introductory example
Consider a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''X'' which can take on the values 0, 1, 2. Such a variable is completely characterized by the three probabilities
:
and these numbers satisfy
:
Conversely, any three such numbers unambiguously specify a random variable, so we can identify the random variable ''X'' with the tuple
.
Now suppose ''X'' is a
binomial random variable with parameter ''q'' and ''n = 2'', i.e. ''X'' represents the number of successes when repeating a certain experiment two times, where each experiment has an individual success probability of ''q''. Then
:
and it is not hard to show that the tuples
which arise in this way are precisely the ones satisfying
:
The latter is a
polynomial equation
In mathematics, an algebraic equation or polynomial equation is an equation of the form P = 0, where ''P'' is a polynomial with coefficients in some field (mathematics), field, often the field of the rational numbers.
For example, x^5-3x+1=0 is a ...
defining an algebraic variety (or surface) in
, and this variety, when intersected with the
simplex
In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
given by
:
yields a piece of an
algebraic curve
In mathematics, an affine algebraic plane curve is the zero set of a polynomial in two variables. A projective algebraic plane curve is the zero set in a projective plane of a homogeneous polynomial in three variables. An affine algebraic plane cu ...
which may be identified with the set of all 3-state Bernoulli variables. Determining the parameter ''q'' amounts to locating one point on this curve; testing the hypothesis that a given variable ''X'' is
Bernoulli amounts to testing whether a certain point lies on that curve or not.
Application of algebraic geometry to statistical learning theory
Algebraic geometry has also recently found applications to
statistical learning theory
Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on da ...
, including a
generalization
A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims. Generalizations posit the existence of a domain or set of elements, as well as one or more common characteri ...
of the
Akaike information criterion to
singular statistical models.
References
*
R. A. Bailey''Association Schemes: Designed Experiments, Algebra and Combinatorics''Cambridge University Press Cambridge, 2004. 387pp. . (Chapters from preliminary draft are available on-line)
*
*
*
H. B. Mann. 1949. ''Analysis and Design of Experiments: Analysis of Variance and Analysis-of-Variance Designs''. Dover.
*
*
*
*
L. Pachter and
B. Sturmfels. ''Algebraic Statistics for Computational Biology.'' Cambridge University Press 2005.
* G. Pistone, E. Riccomango, H. P. Wynn. ''Algebraic Statistics.'' CRC Press, 2001.
* Drton, Mathias,
Sturmfels, Bernd, Sullivant, Seth. ''Lectures on Algebraic Statistics'', Springer 2009.
*
Watanabe, Sumio. ''Algebraic Geometry and Statistical Learning Theory'', Cambridge University Press 2009.
* Paolo Gibilisco, Eva Riccomagno, Maria-Piera Rogantin,
Henry P. Wynn. ''Algebraic and Geometric Methods in Statistics'', Cambridge 2009.
External links
Algebraic Statistics
Journal of Algebraic StatisticsArchives of Journal of Algebraic Statistics
{{DEFAULTSORT:Algebraic Statistics
Statistical theory