Computational statistics
   HOME

TheInfoList



OR:

Computational statistics, or statistical computing, is the bond between
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
and
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
. It means statistical methods that are enabled by using computational methods. It is the area of
computational science Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
(or scientific computing) specific to the mathematical science of
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education. As in traditional statistics the goal is to transform raw data into
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...
, Wegman, Edward J. â
Computational Statistics: A New Agenda for Statistical Theory and Practice.
€ť
Journal of the Washington Academy of Sciences
', vol. 78, no. 4, 1988, pp. 310–322. ''JSTOR''
but the focus lies on
computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
intensive statistical methods, such as cases with very large
sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populati ...
and non-homogeneous data sets. The terms 'computational statistics' and 'statistical computing' are often used interchangeably, although Carlo Lauro (a former president of the
International Association for Statistical Computing The International Association for Statistical Computing (IASC) was founded during the 41st Session of the International Statistical Institute (ISI) in 1977, as a Section of the ISI. The objectives of the association are to foster worldwide interes ...
) proposed making a distinction, defining 'statistical computing' as "the application of computer science to statistics", and 'computational statistics' as "aiming at the design of algorithm for implementing statistical methods on computers, including the ones unthinkable before the computer age (e.g. bootstrap,
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
), as well as to cope with analytically intractable problems" 'sic''.html"_;"title="sic.html"_;"title="'sic">'sic''">sic.html"_;"title="'sic">'sic'' The_term_'Computational_statistics'_may_also_be_used_to_refer_to_computationally_''intensive''_statistical_methods_including_resampling_(statistics).html" ;"title="sic">'sic''.html" ;"title="sic.html" ;"title="'sic">'sic''">sic.html" ;"title="'sic">'sic'' The term 'Computational statistics' may also be used to refer to computationally ''intensive'' statistical methods including resampling (statistics)">resampling methods,
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
methods, local regression, kernel density estimation, artificial neural networks and generalized additive models.


History

Though computational statistics is widely used today, it actually has a relatively short history of acceptance in the
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
community. For the most part, the founders of the field of statistics relied on mathematics and asymptotic approximations in the development of computational statistical methodology. In statistical field, the first use of the term “computer” comes in an article in the ''Journal of the American Statistical Association'' archives by Robert P. Porter in 1891. The article discusses about the use of
Hermann Hollerith Herman Hollerith (February 29, 1860 – November 17, 1929) was a German-American statistician, inventor, and businessman who developed an electromechanical tabulating machine for punched cards to assist in summarizing information and, later, i ...
’s machine in the 11th Census of the United States. Hermann Hollerith’s machine, also called tabulating machine, was an
electromechanical In engineering, electromechanics combines processes and procedures drawn from electrical engineering and mechanical engineering. Electromechanics focuses on the interaction of electrical and mechanical systems as a whole and how the two systems ...
machine designed to assist in summarizing information stored on
punched cards A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
. It was invented by Herman Hollerith (February 29, 1860 – November 17, 1929), an American businessman, inventor, and statistician. His invention of the punched card tabulating machine was patented in 1884, and later was used in the 1890
Census A census is the procedure of systematically acquiring, recording and calculating information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses inc ...
of
the United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territori ...
. The advantages of the technology were immediately apparent. the 1880 Census, with about 50 million people, and it took over 7 years to tabulate. While in the 1890 Census, with over 62 million people, it took less than a year. This marks the beginning of the era of mechanized computational statistics and semiautomatic
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of '' information processing'', which is the modification (processing) of information in any manner detectable by ...
systems. In 1908,
William Sealy Gosset William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...
performed his now well-known Monte Carlo method simulation which led to the discovery of the Student’s t-distribution. With the help of computational methods, he also has plots of the empirical distributions overlaid on the corresponding theoretical distributions. The computer has revolutionized simulation and has made the replication of Gosset’s experiment little more than an exercise. Later on, the scientists put forward computational ways of generating
pseudo-random A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process. Background The generation of random numbers has many uses, such as for rando ...
deviates, performed methods to convert uniform deviates into other distributional forms using inverse
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
or acceptance-rejection methods, and developed state-space methodology for
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
. By the mid-1950s, A lot of work was being done of testing the generators for
randomness In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
. Most of the computers could refer to random number tables now. In 1958,
John Tukey John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
’s jackknife was developed. It is as a method to reduce the
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
of parameter estimates in samples under nonstandard conditions. This requires computers for practical implementations. To this point, computers have made many tedious statistical studies feasible.


Methods


Maximum likelihood estimation

Maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
is used to
estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
the parameters of an assumed
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
, given some observed data. It is achieved by maximizing a
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
so that the observed data is most probable under the assumed
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
.


Monte Carlo method

Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
a statistical method relies on repeated
random sampling In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attemp ...
to obtain numerical results. The concept is to use
randomness In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
to solve problems that might be
deterministic Determinism is a philosophical view, where all events are determined completely by previously existing causes. Deterministic theories throughout the history of philosophy have developed from diverse and sometimes overlapping motives and cons ...
in principle. They are often used in
physical Physical may refer to: * Physical examination, a regular overall check-up with a doctor * ''Physical'' (Olivia Newton-John album), 1981 ** "Physical" (Olivia Newton-John song) * ''Physical'' (Gabe Gurnsey album) * "Physical" (Alcazar song) (2004) * ...
and
mathematical Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
problems and are most useful when it is difficult to use other approaches. Monte Carlo methods are mainly used in three problem classes:
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
,
numerical integration In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equatio ...
, and generating draws from a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
.


Markov chain Monte Carlo

The
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
method creates samples from a continuous
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
or
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
.The more steps are included, the more closely the distribution of the sample matches the actual desired distribution.


Applications

*
Computational biology Computational biology refers to the use of data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the ...
*
Computational linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
* Computational physics *
Computational mathematics Computational mathematics is an area of mathematics devoted to the interaction between mathematics and computer computation.National Science Foundation, Division of Mathematical ScienceProgram description PD 06-888 Computational Mathematics 2006 ...
*
Computational materials science ''Computational Materials Science'' is a monthly peer-reviewed scientific journal published by Elsevier. It was established in October 1992. The editor-in-chief is Susan Sinnott. The journal covers computational modeling and practical research fo ...


Computational statistics journals

*'' Communications in Statistics - Simulation and Computation'' *'' Computational Statistics'' *'' Computational Statistics & Data Analysis'' *''
Journal of Computational and Graphical Statistics The ''Journal of Computational and Graphical Statistics'' is a quarterly peer-reviewed scientific journal published by Taylor & Francis on behalf of the American Statistical Association. Established in 1992, the journal covers the use of computatio ...
'' *'' Journal of Statistical Computation and Simulation'' *''
Journal of Statistical Software The ''Journal of Statistical Software'' is a peer-reviewed open-access scientific journal that publishes papers related to statistical software. The ''Journal of Statistical Software'' was founded in 1996 by Jan de Leeuw of the Department of Sta ...
'' *'' The R Journal'' *'' The Stata Journal'' *'' Statistics and Computing'' *'' Wiley Interdisciplinary Reviews Computational Statistics''


Associations

*
International Association for Statistical Computing The International Association for Statistical Computing (IASC) was founded during the 41st Session of the International Statistical Institute (ISI) in 1977, as a Section of the ISI. The objectives of the association are to foster worldwide interes ...


See also

* Algorithms for statistical classification *
Data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
* Statistical methods in artificial intelligence *
Free statistical software Free statistical software is a practical alternative to commercial packages. Many of the free to use programs aim to be similar in function to commercial packages, in that they are general statistical packages that perform a variety of statistica ...
* List of statistical algorithms *
List of statistical packages Statistical software are specialized computer programs for analysis in statistics and econometrics. Open-source * ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management * ADMB – a softwar ...
*
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...


References


Further reading


Articles

* *


Books

* * * * * * * * *{{Citation, title=Data Science: Scientific and Statistical Computing , first=Reda. R. , last=Gharieb, publisher=Noor Publishing, year=2017, isbn=978-3-330-97256-8


External links


Associations


International Association for Statistical ComputingStatistical Computing section of the American Statistical Association


Journals


Computational Statistics & Data AnalysisJournal of Computational & Graphical Statistics

Statistics and Computing
Numerical analysis Computational fields of study Mathematics of computing