HOME

TheInfoList



OR:

Algebraic statistics is the use of
algebra Algebra () is one of the broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathematics. Elementary ...
to advance
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
. Algebra has been useful for
experimental design The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
, parameter estimation, and
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
. Traditionally, algebraic statistics has been associated with the design of experiments and
multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the diff ...
(especially
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
). In recent years, the term "algebraic statistics" has been sometimes restricted, sometimes being used to label the use of
algebraic geometry Algebraic geometry is a branch of mathematics, classically studying zeros of multivariate polynomials. Modern algebraic geometry is based on the use of abstract algebraic techniques, mainly from commutative algebra, for solving geometrical ...
and
commutative algebra Commutative algebra, first known as ideal theory, is the branch of algebra that studies commutative rings, their ideals, and modules over such rings. Both algebraic geometry and algebraic number theory build on commutative algebra. Prom ...
in statistics.


The tradition of algebraic statistics

In the past, statisticians have used algebra to advance research in statistics. Some algebraic statistics led to the development of new topics in algebra and combinatorics, such as
association scheme The theory of association schemes arose in statistics, in the theory of experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatorics, association schem ...
s.


Design of experiments

For example,
Ronald A. Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
, Henry B. Mann, and
Rosemary A. Bailey Rosemary A. Bailey (born 1947) is a British statistician who works in the design of experiments and the analysis of variance and in related areas of combinatorial design, especially in association schemes. She has written books on the desig ...
applied
Abelian group In mathematics, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is comm ...
s to the
design of experiments The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
. Experimental designs were also studied with
affine geometry In mathematics, affine geometry is what remains of Euclidean geometry when ignoring (mathematicians often say "forgetting") the metric notions of distance and angle. As the notion of '' parallel lines'' is one of the main properties that is ...
over
finite fields In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subtr ...
and then with the introduction of
association scheme The theory of association schemes arose in statistics, in the theory of experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatorics, association schem ...
s by
R. C. Bose Raj Chandra Bose (19 June 1901 – 31 October 1987) was an Indian American mathematician and statistician best known for his work in design theory, finite geometry and the theory of error-correcting codes in which the class of BCH codes is p ...
. Orthogonal arrays were introduced by
C. R. Rao Calyampudi Radhakrishna Rao FRS (born 10 September 1920), commonly known as C. R. Rao, is an Indian-American mathematician and statistician. He is currently professor emeritus at Pennsylvania State University and Research Professor at the ...
also for experimental designs.


Algebraic analysis and abstract statistical inference

Invariant measures on
locally compact group In mathematics, a locally compact group is a topological group ''G'' for which the underlying topology is locally compact and Hausdorff. Locally compact groups are important because many examples of groups that arise throughout mathematics are loc ...
s have long been used in
statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistica ...
, particularly in
multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the diff ...
. Beurling's factorization theorem and much of the work on (abstract)
harmonic analysis Harmonic analysis is a branch of mathematics concerned with the representation of functions or signals as the superposition of basic waves, and the study of and generalization of the notions of Fourier series and Fourier transforms (i.e. an ex ...
sought better understanding of the Wold
decomposition Decomposition or rot is the process by which dead organic substances are broken down into simpler organic or inorganic matter such as carbon dioxide, water, simple sugars and mineral salts. The process is a part of the nutrient cycle and ...
of stationary stochastic processes, which is important in
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
statistics. Encompassing previous results on probability theory on algebraic structures,
Ulf Grenander Ulf Grenander (23 July 1923 – 12 May 2016) was a Swedish statistician and professor of applied mathematics at Brown University. His early research was in probability theory, stochastic processes, time series analysis, and statistical theory (p ...
developed a theory of "abstract inference". Grenander's abstract inference and his theory of patterns are useful for
spatial statistics Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early devel ...
and
image analysis Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as soph ...
; these theories rely on
lattice theory A lattice is an abstract structure studied in the mathematical subdisciplines of order theory and abstract algebra. It consists of a partially ordered set in which every pair of elements has a unique supremum (also called a least upper bou ...
.


Partially ordered sets and lattices

Partially ordered vector spaces and vector lattices are used throughout statistical theory.
Garrett Birkhoff Garrett Birkhoff (January 19, 1911 – November 22, 1996) was an American mathematician. He is best known for his work in lattice theory. The mathematician George Birkhoff (1884–1944) was his father. Life The son of the mathematician Ge ...
metrized the positive cone using Hilbert's projective metric and proved Jentsch's theorem using the
contraction mapping In mathematics, a contraction mapping, or contraction or contractor, on a metric space (''M'', ''d'') is a function ''f'' from ''M'' to itself, with the property that there is some real number 0 \leq k < 1 such that for all ''x'' an ...
theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of t ...
. Birkhoff's results have been used for maximum entropy
estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
(which can be viewed as
linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
in infinite dimensions) by
Jonathan Borwein Jonathan Michael Borwein (20 May 1951 – 2 August 2016) was a Scottish mathematician who held an appointment as Laureate Professor of mathematics at the University of Newcastle, Australia. He was a close associate of David H. Bailey, and the ...
and colleagues.
Vector lattice In mathematics, a Riesz space, lattice-ordered vector space or vector lattice is a partially ordered vector space where the order structure is a lattice. Riesz spaces are named after Frigyes Riesz who first defined them in his 1928 paper ''S ...
s and conical measures were introduced into
statistical decision theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
by
Lucien Le Cam Lucien Marie Le Cam (November 18, 1924 – April 25, 2000) was a mathematician and statistician. Biography Le Cam was born November 18, 1924 in Croze, France. His parents were farmers, and unable to afford higher education for him; his father die ...
.


Recent work using commutative algebra and algebraic geometry

In recent years, the term "algebraic statistics" has been used more restrictively, to label the use of
algebraic geometry Algebraic geometry is a branch of mathematics, classically studying zeros of multivariate polynomials. Modern algebraic geometry is based on the use of abstract algebraic techniques, mainly from commutative algebra, for solving geometrical ...
and
commutative algebra Commutative algebra, first known as ideal theory, is the branch of algebra that studies commutative rings, their ideals, and modules over such rings. Both algebraic geometry and algebraic number theory build on commutative algebra. Prom ...
to study problems related to
discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s with finite state spaces. Commutative algebra and algebraic geometry have applications in statistics because many commonly used classes of discrete random variables can be viewed as
algebraic varieties Algebraic varieties are the central objects of study in algebraic geometry, a sub-field of mathematics. Classically, an algebraic variety is defined as the set of solutions of a system of polynomial equations over the real or complex numbers. ...
.


Introductory example

Consider a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'' which can take on the values 0, 1, 2. Such a variable is completely characterized by the three probabilities :p_i=\mathrm(X=i),\quad i=0,1,2 and these numbers satisfy :\sum_^2 p_i = 1 \quad \mbox\quad 0\leq p_i \leq 1. Conversely, any three such numbers unambiguously specify a random variable, so we can identify the random variable ''X'' with the tuple (''p''0,''p''1,''p''2)∈R3. Now suppose ''X'' is a binomial random variable with parameter ''q'' and ''n = 2'', i.e. ''X'' represents the number of successes when repeating a certain experiment two times, where each experiment has an individual success probability of ''q''. Then :p_i=\mathrm(X=i)=q^i (1-q)^ and it is not hard to show that the tuples (''p''0,''p''1,''p''2) which arise in this way are precisely the ones satisfying :4 p_0 p_2-p_1^2=0.\ The latter is a
polynomial equation In mathematics, an algebraic equation or polynomial equation is an equation of the form :P = 0 where ''P'' is a polynomial with coefficients in some field (mathematics), field, often the field of the rational numbers. For many authors, the term '' ...
defining an algebraic variety (or surface) in R3, and this variety, when intersected with the
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
given by :\sum_^2 p_i = 1 \quad \mbox\quad 0\leq p_i \leq 1, yields a piece of an
algebraic curve In mathematics, an affine algebraic plane curve is the zero set of a polynomial in two variables. A projective algebraic plane curve is the zero set in a projective plane of a homogeneous polynomial in three variables. An affine algebraic plane ...
which may be identified with the set of all 3-state Bernoulli variables. Determining the parameter ''q'' amounts to locating one point on this curve; testing the hypothesis that a given variable ''X'' is
Bernoulli Bernoulli can refer to: People *Bernoulli family of 17th and 18th century Swiss mathematicians: ** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle **Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...
amounts to testing whether a certain point lies on that curve or not.


Application of algebraic geometry to statistical learning theory

Algebraic geometry has also recently found applications to
statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on dat ...
, including a
generalization A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims. Generalizations posit the existence of a domain or set of elements, as well as one or more common character ...
of the
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
to singular statistical models.


References

* R. A. Bailey
''Association Schemes: Designed Experiments, Algebra and Combinatorics''Cambridge University Press
Cambridge, 2004. 387pp. . (Chapters from preliminary draft are available on-line) * * * H. B. Mann. 1949. ''Analysis and Design of Experiments: Analysis of Variance and Analysis-of-Variance Designs''. Dover. * * * * L. Pachter and B. Sturmfels. ''Algebraic Statistics for Computational Biology.'' Cambridge University Press 2005. * G. Pistone, E. Riccomango, H. P. Wynn. ''Algebraic Statistics.'' CRC Press, 2001. * Drton, Mathias, Sturmfels, Bernd, Sullivant, Seth. ''Lectures on Algebraic Statistics'', Springer 2009. * Watanabe, Sumio. ''Algebraic Geometry and Statistical Learning Theory'', Cambridge University Press 2009. * Paolo Gibilisco, Eva Riccomagno, Maria-Piera Rogantin, Henry P. Wynn. ''Algebraic and Geometric Methods in Statistics'', Cambridge 2009.


External links


Algebraic Statistics

Journal of Algebraic Statistics

Archives of Journal of Algebraic Statistics
{{DEFAULTSORT:Algebraic Statistics Statistical theory