HOME

TheInfoList



OR:

Causal analysis Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time (that is, causes must occur before their proposed effect ...
is the field of
experimental design The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
and
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in
randomized controlled trials A randomized controlled trial (or randomized control trial; RCT) is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical te ...
. It is
exploratory research Exploratory research is "the preliminary research to clarify the exact nature of the problem to be solved." It is used to ensure additional research is taken into consideration during an experiment as well as determining research priorities, collect ...
usually preceding more formal causal research in the same way
exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but prim ...
often precedes
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
in
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, en ...


Motivation

Data analysis is primarily concerned with causal questions. For example, did the fertilizer cause the crops to grow? Or, can a given sickness be prevented? Or, why is my friend depressed? The potential outcomes and
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
techniques handle such queries when data is collected using designed experiments. Data collected in
observational Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data via the use of scientific instruments. The ...
studies require different techniques for causal inference (because, for example, of issues such as
confounding In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
). Causal inference techniques used with experimental data require additional assumptions to produce reasonable inferences with observation data. The difficulty of causal inference under such circumstances is often summed up as "
correlation does not imply causation The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The id ...
".


Overview

ECA postulates that there exist data analysis procedures performed on specific subsets of variables within a larger set whose outputs might be indicative of causality between those variables. For example, if we assume every relevant
covariate Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
in the data is observed, then
propensity score matching In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that pre ...
can be used to find the causal effect between two observational variables.
Granger causality The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that cau ...
can also be used to find the causality between two observational variables under different, but similarly strict, assumptions. The two broad approaches to developing such procedures are using ''operational definitions of causality'' or ''verification by "truth"'' (i.e., explicitly ignoring the problem of defining
causality Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the ca ...
and showing that a given algorithm implies a causal relationship in scenarios when causal relationships are known to exist, e.g., using
synthetic data Synthetic data is information that's artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data g ...
).


Operational definitions of causality

Clive Granger Sir Clive William John Granger (; 4 September 1934 – 27 May 2009) was a British econometrician known for his contributions to nonlinear time series analysis. He taught in Britain, at the University of Nottingham and in the United States, at ...
created the first operational definition of causality in 1969. Granger made the definition of probabilistic causality proposed by
Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American mathematician and philosopher. He was a professor of mathematics at the Massachusetts Institute of Technology (MIT). A child prodigy, Wiener later became an early researcher ...
operational as a comparison of variances. Some authors prefer using ECA techniques developed using operational definitions of causality because they believe it may help in the search for causal mechanisms.


Verification by "truth"

Peter Spirtes,
Clark Glymour Clark N. Glymour (born 1942) is the Alumni University Professor Emeritus in the Department of Philosophy at Carnegie Mellon University. He is also a senior research scientist at the Florida Institute for Human and Machine Cognition. Work Glymou ...
, and Richard Scheines introduced the idea of explicitly not providing a definition of causality. Spirtes and Glymour introduced the PC algorithm for causal discovery in 1990. Many recent causal discovery algorithms follow the Spirtes-Glymour approach to verification.


Techniques

There are many surveys of causal discovery techniques. This section lists the well-known techniques.


Bivariate (or "pairwise")

*
Granger causality The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that cau ...
(there is also the Scholarpedia entr

* transfer entropy *
convergent cross mapping Convergent cross mapping (CCM) is a statistical test for a cause-and-effect relationship between two variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation.' Sugihara G., May R., Y ...


Multivariate

* causation entropy * PC algorithm * FCI algorithm * LiNGA

Many of these techniques are discussed in the tutorials provided by the Center for Causal Discovery (CCD


Use-case examples


Social science

The PC algorithm has been applied to several different social science data sets.


Medicine

The PC algorithm has been applied to medical data. Granger causality has been applied to functional magnetic resonance imaging, fMRI data. CCD tested their tools using biomedical dat


Physics

ECA is used in physics to understand the physical causal mechanisms of the system, e.g., in geophysics using the PC-stable algorithm (a variant of the original PC algorithm) and in dynamical systems using pairwise asymmetric inference (a variant of convergent cross mapping).


Criticism

There is debate over whether or not the relationships between data found using causal discovery are actually causal.
Judea Pearl Judea Pearl (born September 4, 1936) is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belie ...
has emphasized that causal inference requires a causal model developed by "intelligence" through an iterative process of testing assumptions and fitting data. Response to the criticism points out that assumptions used for developing ECA techniques may not hold for a given data set and that any causal relationships discovered during ECA are contingent on these assumptions holding true


Software Packages


Comprehensive toolkits


Tetrad
is an open source GUI-based Java program that provides a collection of causal discovery algorithms. The algorithm library used by Tetrad is also available as a
command-line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive command (computing), commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invokin ...
tool, Python
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
, and R wrapper.
Java Information Dynamics Toolkit (JIDT)
is an open source Java library for performing information-theoretic causal discovery (i.e., transfer entropy, conditional transfer entropy, etc

Examples of using the library in
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
,
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lang ...
, Python, R,
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e ...
and
Clojure Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is ...
are provided in the documentatio


pcalg
is an R package that provides some of the same causal discovery algorithms provided in Tetra


Specific Techniques


Granger causality

* R packag

* Python packag

h3>

convergent cross mapping

* R packag

h3>

LiNGAM

*
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
/
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lang ...
packag

There is also a collection of tools and data maintained by the Causality Workbench tea

and the CCD tea


References

{{DEFAULTSORT:Exploratory Causal Analysis Exploratory data analysis, *