A mixed model, mixed-effects model or mixed error-component model is a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
containing both
fixed effect
In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random va ...
s and
random effects.
These models are useful in a wide variety of disciplines in the physical, biological and social sciences.
They are particularly useful in settings where
repeated measurements are made on the same
statistical units (
longitudinal study
A longitudinal study (or longitudinal survey, or panel study) is a research design that involves repeated observations of the same variables (e.g., people) over short or long periods of time (i.e., uses longitudinal data). It is often a type of ...
), or where measurements are made on clusters of related statistical units.
Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures
analysis of variance
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
.
This page will discuss mainly linear mixed-effects models (LMEM) rather than
generalized linear mixed models or
nonlinear mixed-effects models.
History and current status
Ronald Fisher
Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...
introduced
random effects models to study the correlations of trait values between relatives. In the 1950s,
Charles Roy Henderson
provided
best linear unbiased estimates of
fixed effects and
best linear unbiased predictions of random effects.
Subsequently, mixed modeling has become a major area of statistical research, including work on computation of maximum likelihood estimates, non-linear mixed effects models, missing data in mixed effects models, and
Bayesian estimation of mixed effects models. Mixed models are applied in many disciplines where multiple correlated measurements are made on each unit of interest. They are prominently used in research involving human and animal subjects in fields ranging from genetics to marketing, and have also been used in baseball and industrial statistics.
Definition
In
matrix notation a linear mixed model can be represented as
:
where
*
is a known vector of observations, with mean
;
*
is an unknown vector of fixed effects;
*
is an unknown vector of random effects, with mean
and
variance–covariance matrix ;
*
is an unknown vector of random errors, with mean
and variance
;
*
and
are known
design matrices relating the observations
to
and
, respectively.
Estimation
The joint density of
and
can be written as:
.
Assuming normality,
,
and
, and maximizing the joint density over
and
, gives Henderson's "mixed model equations" (MME) for linear mixed models:
[
:
The solutions to the MME, and are best linear unbiased estimates and predictors for and , respectively. This is a consequence of the Gauss–Markov theorem when the conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is best linear unbiased estimates. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.
One method used to fit such mixed models is that of the expectation–maximization algorithm (EM) where the variance components are treated as unobserved nuisance parameters in the joint likelihood. Currently, this is the method implemented in statistical software such as Python ( statsmodels package) and ]SAS
SAS or Sas may refer to:
Arts, entertainment, and media
* ''SAS'' (novel series), a French book series by Gérard de Villiers
* ''Shimmer and Shine'', an American animated children's television series
* Southern All Stars, a Japanese rock ba ...
(proc mixed), and as initial step only in R's nlme package lme(). The solution to the mixed model equations is a maximum likelihood estimate when the distribution of the errors is normal.
There are several other methods to fit mixed models, including using an EM initially, and then Newton-Raphson (used by R package nlme's lme()), penalized least squares to get a profiled log likelihood only depending on the (low-dimensional) variance-covariance parameters of , i.e., its cov matrix , and then modern direct optimization for that reduced objective function (used by R's lme4 package lmer() and the Julia package MixedModels.jl) and direct optimization of the likelihood (used by e.g. R's glmmTMB). Notably, while the canonical form proposed by Henderson is useful for theory, many popular software packages use a different formulation for numerical computation in order to take advantage of sparse matrix methods (e.g. lme4 and MixedModels.jl).
See also
* Nonlinear mixed-effects model
* Fixed effects model
* Generalized linear mixed model
* Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
* Mixed-design analysis of variance
* Multilevel model
Multilevel models (also known as hierarchical linear models, linear mixed-effect model, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parame ...
* Random effects model
* Repeated measures design
Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are ...
* Empirical Bayes method
References
Further reading
*
*
*
{{DEFAULTSORT:Mixed Model
Regression models
Analysis of variance