In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a generalized estimating equation (GEE) is used to
estimate
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
the parameters of a
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
with a possible unmeasured
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between observations from different timepoints.
Regression beta coefficient estimates from the Liang-Zeger GEE are
consistent
In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
, unbiased, and asymptotically normal even when the working correlation is misspecified, under mild regularity conditions. GEE is higher in efficiency than
generalized linear models (GLMs) in the presence of high autocorrelation.
When the true working correlation is known, consistency does not require the assumption that missing data is
missing completely at random.
Huber-White standard errors improve the efficiency of Liang-Zeger GEE in the absence of
serial autocorrelation but may remove the marginal interpretation. GEE estimates the average response over the population ("population-averaged" effects) with
Liang-Zeger standard errors, and in individuals using
Huber-White standard errors, also known as "robust standard error" or "sandwich variance" estimates. Huber-White GEE was used since 1997, and Liang-Zeger GEE dates to the 1980s based on a limited literature review. Several independent formulations of these standard error estimators contribute to GEE theory. Placing the independent standard error estimators under the umbrella term "GEE" may exemplify
abuse of terminology
In mathematics, abuse of notation occurs when an author uses a mathematical notation in a way that is not entirely formally correct, but which might help simplify the exposition or suggest the correct intuition (while possibly minimizing errors an ...
.
GEEs belong to a class of regression techniques that are referred to as
semiparametric In statistics, a semiparametric model is a statistical model that has Parametric statistics, parametric and nonparametric components.
A statistical model is a parameterized family of distributions: \ indexed by a statistical parameter, parameter \t ...
because they rely on specification of only the first two
moments. They are a popular alternative to the
likelihood
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
-based
generalized linear mixed model which is more at risk for consistency loss at variance structure specification. The trade-off of variance-structure misspecification and consistent regression coefficient estimates is loss of efficiency, yielding inflated
Wald test p-values as a result of higher variance of standard errors than that of the most optimal. They are commonly used in large
epidemiological
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
studies, especially multi-site
cohort studies
A cohort study is a particular form of longitudinal study that samples a cohort (a group of people who share a defining characteristic, typically those who experienced a common event in a selected period, such as birth or graduation), performing ...
, because they can handle many types of unmeasured dependence between outcomes.
Formulation
Given a mean model
for subject
and time
that depends upon regression parameters
, and variance structure,
, the estimating equation is formed via:
:
The parameters
are estimated by solving
and are typically obtained via the
Newton–Raphson algorithm. The variance structure is chosen to improve the efficiency of the parameter estimates. The
Hessian of the solution to the GEEs in the parameter space can be used to calculate robust standard error estimates. The term "variance structure" refers to the algebraic form of the covariance matrix between outcomes, Y, in the sample. Examples of variance structure specifications include independence, exchangeable, autoregressive, stationary m-dependent, and unstructured. The most popular form of inference on GEE regression parameters is the
Wald test using naive or robust standard errors, though the
Score test is also valid and preferable when it is difficult to obtain estimates of
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
under the alternative hypothesis. The
likelihood ratio test is not valid in this setting because the estimating equations are not necessarily likelihood equations. Model selection can be performed with the GEE equivalent of the
Akaike Information Criterion (AIC), the
quasi-likelihood under the independence model criterion (QIC).
Relationship with Generalized Method of Moments
The generalized estimating equation is a special case of the
generalized method of moments (GMM). This relationship is immediately obvious from the requirement that the score function satisfy the equation:
Computation
Software for solving generalized estimating equations is available in
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
SAS (proc genmod),
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versi ...
(the gee procedure),
Stata (the xtgee command),
R (packages glmtoolbox, gee, geepack and multgee),
Julia (package GEE.jl) and
Python (package statsmodels).
Comparisons among software packages for the analysis of binary correlated data and ordinal correlated data
via GEE are available.
See also
*
Generalized method of moments
*
Repeated measures design
Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are c ...
References
Further reading
*
*
External links
Advanced Topics I - Generalized Estimating Equations (GEE)
{{DEFAULTSORT:Generalized Estimating Equations
Regression analysis
Estimation methods
M-estimators
Semi-parametric models