General linear model
   HOME

TheInfoList



OR:

The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical
linear model In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
. The various multiple linear regression models may be compactly written as : \mathbf = \mathbf\mathbf + \mathbf, where Y is a
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
with series of multivariate measurements (each column being a set of measurements on one of the
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
s), X is a matrix of observations on
independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
s that might be a design matrix (each column being a set of observations on one of the independent variables), B is a matrix containing parameters that are usually to be estimated and U is a matrix containing
errors An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
(noise). The errors are usually assumed to be uncorrelated across measurements, and follow a
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
. If the errors do not follow a multivariate normal distribution,
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s may be used to relax assumptions about Y and U. The general linear model incorporates a number of different statistical models: ANOVA,
ANCOVA Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatm ...
,
MANOVA In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests ...
, MANCOVA, ordinary
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
, ''t''-test and ''F''-test. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If Y, B, and U were column vectors, the matrix equation above would represent multiple linear regression. Hypothesis tests with the general linear model can be made in two ways:
multivariate Multivariate may refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial In computing * Multivariate cryptography * Multivariate division algorithm * Multivariate interpolation * Multivariate optical c ...
or as several independent univariate tests. In multivariate tests the columns of Y are tested together, whereas in univariate tests the columns of Y are tested independently, i.e., as multiple univariate tests with the same design matrix.


Comparison to multiple linear regression

Multiple linear regression is a generalization of
simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
to the case of more than one independent variable, and a
special case In logic, especially as applied in mathematics, concept is a special case or specialization of concept precisely if every instance of is also an instance of but not vice versa, or equivalently, if is a generalization of . A limiting case ...
of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is : Y_i = \beta_0 + \beta_1 X_ + \beta_2 X_ + \ldots + \beta_p X_ + \epsilon_i or more compactly Y_i = \beta_0 + \sum \limits_^ + \epsilon_i for each observation ''i'' = 1, ... , ''n''. In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''''i'' is the ''i''th observation of the dependent variable, ''X''''ij'' is ''i''th observation of the ''j''th independent variable, ''j'' = 1, 2, ..., ''p''. The values ''β''''j'' represent parameters to be estimated, and ''ε''''i'' is the ''i''th independent identically distributed normal error. In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other: : Y_ = \beta_ + \beta_ X_ + \beta_X_ + \ldots + \beta_ X_ + \epsilon_ or more compactly Y_ = \beta_ + \sum \limits_^ + \epsilon_ for all observations indexed as ''i'' = 1, ... , ''n'' and for all dependent variables indexed as ''j = 1, ... , ''m''. Note that, since each dependent variable has its own set of regression parameters to be fitted, from a computational point of view the general multivariate regression is simply a sequence of standard multiple linear regressions using the same explanatory variables.


Comparison to generalized linear model

The general linear model and the generalized linear model (GLM) are two commonly used families of statistical methods to relate some number of continuous and/or categorical predictors to a single outcome variable. The main difference between the two approaches is that the general linear model strictly assumes that the residuals will follow a conditionally
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
,Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. while the GLM loosens this assumption and allows for a variety of other distributions from the
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
for the residuals. Of note, the general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution. The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLM family. Commonly used models in the GLM family include binary logistic regression for binary or dichotomous outcomes, Poisson regression for count outcomes, and
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
for continuous, normally distributed outcomes. This means that GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types.


Applications

An application of the general linear model appears in the analysis of multiple
brain scan Neuroimaging is the use of quantitative (computational) techniques to study the structure and function of the central nervous system, developed as an objective way of scientifically studying the healthy human brain in a non-invasive manner. Incr ...
s in scientific experiments where contains data from brain scanners, contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a ''mass-univariate'' in this setting) and is often referred to as
statistical parametric mapping Statistical parametric mapping (SPM) is a statistical technique for examining differences in brain activity recorded during functional neuroimaging experiments. It was created by Karl Friston. It may alternatively refer to software created by ...
.


See also

* Bayesian multivariate linear regression * F-test *
t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...


Notes


References

* * * {{statistics Regression models