HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, a generalized estimating equation (GEE) is used to
estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
the parameters of a
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
with a possible unmeasured
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice. Regression beta coefficient estimates from the Liang Zeger GEE are
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
, unbiased, asymptotically normal even when the working correlation is misspecified, under mild regularity conditions. GEE is higher in efficiency than generalized linear iterative model
GLIM (software) GLIM (an acronym for Generalized Linear Interactive Modelling) is a statistical software program for fitting generalized linear models (GLMs). It was developed by the Royal Statistical Society's Working Party on Statistical Computing (later rena ...
in the presence of high autocorrelation. When the true working-correlation is known, consistency does not require MCAR. Huber-White standard errors improve the efficiency of Liang Zeger GEE in the absence of serial Autocorrelation but may remove the marginal interpretation. GEE estimates the average response over the population ("population-averaged" effects) with Liang Zeger Standard Errors, and in individuals using Huber White Standard Errors also known as "robust standard error" or "sandwich variance" estimates. Huber-White GEE was used since 1997, and Liang Zeger GEE dates to the 1980s based on a limited literature review. Several independent formulations of these standard error estimators contribute to GEE theory. Placing the independent standard error estimators under the umbrella term "GEE" may exemplify Abuse of language. GEEs belong to a class of regression techniques that are referred to as semiparametric because they rely on specification of only the first two moments. They are a popular alternative to the
likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
–based generalized linear mixed model which is more at risk for consistency loss at variance structure specification. The trade-off of variance-structure misspecification and consistent regression coefficient estimates is loss of efficiency, so inflated Wald test p-values as a result of higher variance of standard errors than that of the most optimal. They are commonly used in large
epidemiological Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evid ...
studies, especially multi-site
cohort studies A cohort study is a particular form of longitudinal study that samples a cohort (a group of people who share a defining characteristic, typically those who experienced a common event in a selected period, such as birth or graduation), performing ...
, because they can handle many types of unmeasured dependence between outcomes.


Formulation

Given a mean model \mu_ for subject i and time j that depends upon regression parameters \beta_k, and variance structure, V_, the estimating equation is formed via: : U(\beta) = \sum_^N \frac V_i^ \ \,\! The parameters \beta_k are estimated by solving U(\beta)=0 and are typically obtained via the
Newton–Raphson algorithm A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software. Divis ...
. The variance structure is chosen to improve the efficiency of the parameter estimates. The Hessian of the solution to the GEEs in the parameter space can be used to calculate robust standard error estimates. The term "variance structure" refers to the algebraic form of the covariance matrix between outcomes, Y, in the sample. Examples of variance structure specifications include independence, exchangeable, autoregressive, stationary m-dependent, and unstructured. The most popular form of inference on GEE regression parameters is the
Wald test In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the ...
using naive or robust standard errors, though the Score test is also valid and preferable when it is difficult to obtain estimates of
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
under the alternative hypothesis. The likelihood ratio test is not valid in this setting because the estimating equations are not necessarily likelihood equations. Model selection can be performed with the GEE equivalent of the
Akaike Information Criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
(AIC), the quasi-likelihood under the independence model criterion (QIC).


Relationship with Generalized Method of Moments

The generalized estimating equation is a special case of the generalized method of moments (GMM). This relationship is immediately obvious from the requirement that the score function satisfy the equation:\mathbb (\beta)= \sum_^N \frac V_i^ \ \,\! = 0


Computation

Software for solving generalized estimating equations is available in
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
, SAS (proc genmod),
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
(the gee procedure),
Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
(the xtgee command), R (packages gee, geepack and multgee),
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
(package GEE.jl) and Python (package statsmodels). Comparisons among software packages for the analysis of binary correlated data and ordinal correlated data via GEE are available.


See also

* Generalized method of moments *
Repeated measures design Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are c ...


References


Further reading

* *


External links


Generalized Estimating Equations (GEE) - Part 1

Advanced Topics I - Generalized Estimating Equations (GEE)
{{DEFAULTSORT:Generalized Estimating Equations Regression analysis Estimation methods M-estimators Semi-parametric models