In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a generalized estimating equation (GEE) is used to
estimate
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
the parameters of a
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
with a possible unmeasured
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between observations from different timepoints.
Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.
Regression beta coefficient estimates from the Liang Zeger GEE are
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
, unbiased, asymptotically normal even when the working correlation is misspecified, under mild regularity conditions. GEE is higher in efficiency than generalized linear iterative model
GLIM (software) GLIM (an acronym for Generalized Linear Interactive Modelling) is a statistical software program for fitting generalized linear models (GLMs).
It was developed by the Royal Statistical Society's
Working Party on Statistical Computing
(later rena ...
in the presence of high autocorrelation.
When the true working-correlation is known, consistency does not require MCAR.
Huber-White standard errors improve the efficiency of Liang Zeger GEE in the absence of
serial Autocorrelation but may remove the marginal interpretation. GEE estimates the average response over the population ("population-averaged" effects) with Liang Zeger Standard Errors, and in individuals using Huber White Standard Errors also known as "robust standard error" or "sandwich variance" estimates. Huber-White GEE was used since 1997, and Liang Zeger GEE dates to the 1980s based on a limited literature review. Several independent formulations of these standard error estimators contribute to GEE theory. Placing the independent standard error estimators under the umbrella term "GEE" may exemplify
Abuse of language.
GEEs belong to a class of regression techniques that are referred to as
semiparametric In statistics, a semiparametric model is a statistical model that has Parametric statistics, parametric and nonparametric components.
A statistical model is a parameterized family of distributions: \ indexed by a statistical parameter, parameter \t ...
because they rely on specification of only the first two
moments. They are a popular alternative to the
likelihood
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
–based
generalized linear mixed model
In statistics, a generalized linear mixed model (GLMM) is an extension to the generalized linear model (GLM) in which the linear predictor contains random effects in addition to the usual fixed effects. They also inherit from GLMs the idea of exte ...
which is more at risk for consistency loss at variance structure specification. The trade-off of variance-structure misspecification and consistent regression coefficient estimates is loss of efficiency, so inflated Wald test p-values as a result of higher variance of standard errors than that of the most optimal. They are commonly used in large
epidemiological
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population.
It is a cornerstone of public health, and shapes policy decisions and evidenc ...
studies, especially multi-site
cohort studies
A cohort study is a particular form of longitudinal study that samples a cohort (a group of people who share a defining characteristic, typically those who experienced a common event in a selected period, such as birth or graduation), performing ...
, because they can handle many types of unmeasured dependence between outcomes.
Formulation
Given a mean model
for subject
and time
that depends upon regression parameters
, and variance structure,
, the estimating equation is formed via:
:
The parameters
are estimated by solving
and are typically obtained via the
Newton–Raphson algorithm. The variance structure is chosen to improve the efficiency of the parameter estimates. The
Hessian
A Hessian is an inhabitant of the German state of Hesse.
Hessian may also refer to:
Named from the toponym
*Hessian (soldier), eighteenth-century German regiments in service with the British Empire
**Hessian (boot), a style of boot
**Hessian f ...
of the solution to the GEEs in the parameter space can be used to calculate robust standard error estimates. The term "variance structure" refers to the algebraic form of the covariance matrix between outcomes, Y, in the sample. Examples of variance structure specifications include independence, exchangeable, autoregressive, stationary m-dependent, and unstructured. The most popular form of inference on GEE regression parameters is the
Wald test
In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the ...
using naive or robust standard errors, though the
Score test
In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the ...
is also valid and preferable when it is difficult to obtain estimates of
information
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
under the alternative hypothesis. The
likelihood ratio test
In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...
is not valid in this setting because the estimating equations are not necessarily likelihood equations. Model selection can be performed with the GEE equivalent of the
Akaike Information Criterion
The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
(AIC), the
quasi-likelihood under the independence model criterion
In statistics, quasi-likelihood methods are used to estimate parameters in a statistical model when exact likelihood methods, for example maximum likelihood estimation, are computationally infeasible. Due to the wrong likelihood being used, quasi-l ...
(QIC).
Relationship with Generalized Method of Moments
The generalized estimating equation is a special case of the
generalized method of moments (GMM). This relationship is immediately obvious from the requirement that the score function satisfy the equation:
Computation
Software for solving generalized estimating equations is available in
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
,
SAS
SAS or Sas may refer to:
Arts, entertainment, and media
* ''SAS'' (novel series), a French book series by Gérard de Villiers
* ''Shimmer and Shine'', an American animated children's television series
* Southern All Stars, a Japanese rock ba ...
(proc genmod),
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
(the gee procedure),
Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
(the xtgee command),
R (packages gee, geepack and multgee),
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g. ...
(package GEE.jl) and
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
(package statsmodels).
Comparisons among software packages for the analysis of binary correlated data and ordinal correlated data
via GEE are available.
See also
*
Generalized method of moments
*
Repeated measures design
Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are c ...
References
Further reading
*
*
External links
Generalized Estimating Equations (GEE) - Part 1Advanced Topics I - Generalized Estimating Equations (GEE)
{{DEFAULTSORT:Generalized Estimating Equations
Regression analysis
Estimation methods
M-estimators
Semi-parametric models