Path analysis (statistics)
   HOME

TheInfoList



OR:

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis,
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
, canonical correlation analysis,
discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
, as well as more general families of models in the multivariate analysis of variance and covariance analyses (
MANOVA In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests ...
,
ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
, ANCOVA). In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling and analysis of covariance structures. Path analysis is considered by
Judea Pearl Judea Pearl (born September 4, 1936) is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on beli ...
to be a direct ancestor to the techniques of Causal inference.


History

Path analysis was developed around 1918 by geneticist
Sewall Wright Sewall Green Wright FRS(For) Honorary FRSE (December 21, 1889March 3, 1988) was an American geneticist known for his influential work on evolutionary theory and also for his work on path analysis. He was a founder of population genetics alongsi ...
, who wrote about it more extensively in the 1920s. It has since been applied to a vast array of complex modeling areas, including
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
,
psychology Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries between ...
,
sociology Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation an ...
, and
econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
.


Path modeling

Typically, path models consist of independent and dependent variables depicted graphically by boxes or rectangles. Variables that are independent variables, and not dependent variables, are called 'exogenous'. Graphically, these exogenous variable boxes lie at outside edges of the model and have only single-headed arrows exiting from them. No single-headed arrows point at exogenous variables. Variables that are solely dependent variables, or are both independent and dependent variables, are termed 'endogenous'. Graphically, endogenous variables have at least one single-headed arrow pointing at them. In the model below, the two exogenous variables (Ex1 and Ex2) are modeled as being
correlated In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
as depicted by the double-headed arrow. Both of these variables have direct and indirect (through En1) effects on En2 (the two dependent or 'endogenous' variables/factors). In most real-world models, the endogenous variables may also be affected by variables and factors stemming from outside the model (external effects including measurement error). These effects are depicted by the "e" or error terms in the model. Using the same variables, alternative models are conceivable. For example, it may be hypothesized that Ex1 has only an indirect effect on En2, deleting the arrow from Ex1 to En2; and the likelihood or 'fit' of these two models can be compared statistically.


Path tracing rules

In order to validly calculate the relationship between any two boxes in the diagram, Wright (1934) proposed a simple set of path tracing rules, for calculating the correlation between two variables. The correlation is equal to the sum of the contribution of all the pathways through which the two variables are connected. The strength of each of these contributing pathways is calculated as the product of the path-coefficients along that pathway. The rules for path tracing are: # You can trace backward up an arrow and then forward along the next, or forwards from one variable to the other, but never forward and then back. Another way to think of this rule is that you can never pass out of one arrow head and into another arrowhead: heads-tails, or tails-heads, not heads-heads. # You can pass through each variable only once in a given chain of paths. # No more than one bi-directional arrow can be included in each path-chain. Again, the expected correlation due to each chain traced between two variables is the product of the standardized path coefficients, and the total expected correlation between two variables is the sum of these contributing path-chains. NB: Wright's rules assume a model without feedback loops: the
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
of the model must contain no cycles, i.e. it is a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one v ...
, which has been extensively studied in the causal analysis framework of
Judea Pearl Judea Pearl (born September 4, 1936) is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on beli ...
.


Path tracing in unstandardized models

If the modeled variables have not been standardized, an additional rule allows the expected covariances to be calculated as long as no paths exist connecting dependent variables to other dependent variables. The simplest case obtains where all residual variances are modeled explicitly. In this case, in addition to the three rules above, calculate expected covariances by: # Compute the product of coefficients in each route between the variables of interest, tracing backwards, changing direction at a two-headed arrow, then tracing forwards. # Sum over all distinct routes, where pathways are considered distinct if they contain different coefficients, or encounter those coefficients in a different order. Where residual variances are not explicitly included, or as a more general solution, at any change of direction encountered in a route (except for at two-way arrows), include the variance of the variable at the point of change. That is, in tracing a path from a dependent variable to an independent variable, include the variance of the independent-variable except where so doing would violate rule 1 above (passing through adjacent arrowheads: i.e., when the independent variable also connects to a double-headed arrow connecting it to another independent variable). In deriving variances (which is necessary in the case where they are not modeled explicitly), the path from a dependent variable into an independent variable and back is counted once only.


See also

*
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
* Causality *
Causal loop diagram A causal loop diagram (CLD) is a causal diagram that aids in visualizing how different variables in a system are causally interrelated. The diagram consists of a set of words and arrows. Causal loop diagrams are accompanied by a narrative which de ...
*
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
*
Latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...
* Path coefficient *
Structural equation model Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral scienc ...
(SEM)


References


External links


Ωnyx, a free software environment for Structural Equation Modeling OpenMx - Advanced Structural Equation ModelingLISREL: model, methods and software for Structural Equation Modeling
{{DEFAULTSORT:Path Analysis (Statistics) Structural equation models Graphical models Independence (probability theory)