A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an

in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon' (correct ), referring to silicon in computer chips. It was c ...

system. This area includes

computational physics Computational physics is the study and implementation of numerical analysis to solve problems in physics. Historically, computational physics was the first application of modern computers in science, and is now a subset of computational science ...

computational chemistry Computational chemistry is a branch of chemistry that uses computer simulations to assist in solving chemical problems. It uses methods of theoretical chemistry incorporated into computer programs to calculate the structures and properties of mol ...

computational biology Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...

and other similar disciplines.

Background

Computer simulation Computer simulation is the running of a mathematical model on a computer, the model being designed to represent the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determin ...

s are constructed to emulate a physical system. Because these are meant to replicate some aspect of a system in detail, they often do not yield an analytic solution. Therefore, methods such as

discrete event simulation A discrete-event simulation (DES) models the operation of a system as a (discrete) sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in th ...

finite element Finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat tran ...

solvers are used. A

computer model Computer simulation is the running of a mathematical model on a computer, the model being designed to represent the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determin ...

is used to make inferences about the system it replicates. For example, climate models are often used because experimentation on an earth sized object is impossible.

Objectives

Computer experiments have been employed with many purposes in mind. Some of those include: *

Uncertainty quantification Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system ...

: Characterize the uncertainty present in a computer simulation arising from unknowns during the computer simulation's construction. *

Inverse problem An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, sound source reconstruction, source reconstruction in ac ...

s: Discover the underlying properties of the system from the physical data. * Bias correction: Use physical data to correct for bias in the simulation. *

Data assimilation Data assimilation refers to a large group of methods that update information from numerical computer models with information from observations. Data assimilation is used to update model states, model trajectories over time, model parameters, and ...

: Combine multiple simulations and physical data sources into a complete predictive model. *

Systems design The basic study of system design is the understanding of component parts and their subsequent interaction with one another. Systems design has appeared in a variety of fields, including sustainability, computer/software architecture, and sociolog ...

: Find inputs that result in optimal system performance measures.

Computer simulation modeling

Modeling of computer experiments typically uses a Bayesian framework.

Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...

is an interpretation of the field of

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

where all evidence about the true state of the world is explicitly expressed in the form of

probabilities Probability is a branch of mathematics and statistics concerning Event (probability theory), events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probab ...

. In the realm of computer experiments, the Bayesian interpretation would imply we must form a

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

that represents our prior belief on the structure of the computer model. The use of this philosophy for computer experiments started in the 1980s and is nicely summarized by Sacks et al. (1989

While the Bayesian approach is widely used,

frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

approaches have been recently discusse

The basic idea of this framework is to model the computer simulation as an unknown function of a set of inputs. The computer simulation is implemented as a piece of computer code that can be evaluated to produce a collection of outputs. Examples of inputs to these simulations are coefficients in the underlying model,

initial conditions In mathematics and particularly in dynamic systems, an initial condition, in some contexts called a seed value, is a value of an evolving variable at some point in time designated as the initial time (typically denoted ''t'' = 0). Fo ...

and forcing functions. It is natural to see the simulation as a deterministic function that maps these ''inputs'' into a collection of ''outputs''. On the basis of seeing our simulator this way, it is common to refer to the collection of inputs as

x

, the computer simulation itself as

f

, and the resulting output as

f(x)

. Both

x

and

f(x)

are vector quantities, and they can be very large collections of values, often indexed by space, or by time, or by both space and time. Although

f(\cdot)

is known in principle, in practice this is not the case. Many simulators comprise tens of thousands of lines of high-level computer code, which is not accessible to intuition. For some simulations, such as climate models, evaluation of the output for a single set of inputs can require millions of computer hour

Gaussian process prior

The typical model for a computer code output is a Gaussian process. For notational simplicity, assume

f(x)

is a scalar. Owing to the Bayesian framework, we fix our belief that the function

f

follows a

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...

f \sim \operatorname(m(\cdot),C(\cdot,\cdot)),

where

m

is the mean function and

C

is the covariance function. Popular mean functions are low order polynomials and a popular

covariance function In probability theory and statistics, the covariance function describes how much two random variables change together (their ''covariance'') with varying spatial or temporal separation. For a random field or stochastic process ''Z''(''x'') on a dom ...

Matern covariance Matern or Matérn is a surname. It can also be a masculine given name. Notable people with this name include: As a surname * Anik Matern, Canadian actress * Bertil Matérn (1917–2007), Swedish statistician * Hermann Matern (1893–1971), German ...

, which includes both the exponential (

\nu = 1/2

) and Gaussian covariances (as

\nu \rightarrow \infty

Design of computer experiments

The design of computer experiments has considerable differences from

design of experiments The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...

for parametric models. Since a Gaussian process prior has an infinite dimensional representation, the concepts of A and D criteria (see

Optimal design In the design of experiments, optimal experimental designs (or optimum designs) are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to D ...

), which focus on reducing the error in the parameters, cannot be used. Replications would also be wasteful in cases when the computer simulation has no error. Criteria that are used to determine a good experimental design include integrated mean squared prediction erro

and distance based criteri

Popular strategies for design include

latin hypercube sampling Latin hypercube sampling (LHS) is a statistical method for generating a near-random sample of parameter values from a multidimensional distribution. The sampling method is often used to construct computer experiments or for Monte Carlo integratio ...

and

low discrepancy sequences In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x_1, \ldots, x_N has a low discrepancy. Roughly speaking, the discrepancy of a sequence is low if the proportion of points in the ...

Problems with massive sample sizes

Unlike physical experiments, it is common for computer experiments to have thousands of different input combinations. Because the standard inference requires matrix inversion of a square matrix of the size of the number of samples (

n

), the cost grows on the

\mathcal (n^3)

. Matrix inversion of large, dense matrices can also cause numerical inaccuracies. Currently, this problem is solved by greedy decision tree techniques, allowing effective computations for unlimited dimensionality and sample siz
patent WO2013055257A1
or avoided by using approximation methods, e.g