HOME

TheInfoList



OR:

Clustered standard errors (or Liang-Zeger standard errors) are measurements that estimate the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group. Clustered standard errors are widely used in a variety of applied econometric settings, including difference-in-differences or experiments. Analogous to how Huber-White standard errors are
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
in the presence of
heteroscedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
and Newey–West standard errors are consistent in the presence of accurately-modeled
autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
, clustered standard errors are consistent in the presence of cluster-based sampling or treatment assignment. Clustered standard errors are often justified by possible correlation in modeling residuals within each cluster; while recent work suggests that this is not the precise justification behind clustering, it may be
pedagogically Pedagogy (), most commonly understood as the approach to teaching, is the theory and practice of learning, and how this process influences, and is influenced by, the social, political and psychological development of learners. Pedagogy, taken a ...
useful.


Intuitive motivation

Clustered standard errors are often useful when treatment is assigned at the level of a ''cluster'' instead of at the individual level. For example, suppose that an educational researcher wants to discover whether a new teaching technique improves student test scores. She therefore assigns teachers in "treated" classrooms to try this new technique, while leaving "control" classrooms unaffected. When analyzing her results, she may want to keep the data at the student level (for example, to control for student-level observable characteristics). However, when estimating the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
or
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
of her statistical model, she realizes that classical or even heteroscedasticity-robust standard errors are inappropriate because student test scores within each class are ''not'' independently distributed. Instead, students in classes with better teachers have especially high test scores (regardless of whether they receive the experimental treatment) while students in classes with worse teachers have especially low test scores. The researcher can cluster her standard errors at the level of a classroom to account for this aspect of her experiment. While this example is very specific, similar issues arise in a wide variety of settings. For example, in many panel data settings (such as difference-in-differences) clustering often offers a simple and effective way to account for non-independence between periods within each unit (sometimes referred to as "autocorrelation in residuals"). Another common and logically distinct justification for clustering arises when a full population cannot be randomly sampled, and so instead clusters are sampled and then units are randomized within cluster. In this case, clustered standard errors account for the uncertainty driven by the fact that the researcher does not observe large parts of the population of interest.


Mathematical motivation

A useful mathematical illustration comes from the case of one-way clustering in an
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) model. Consider a simple model with ''N'' observations that are subdivided in ''C'' clusters. Let Y be an n \times 1 vector of outcomes, X a n \times m matrix of covariates, \beta an m \times 1 vector of unknown parameters, and e an n \times 1 vector of unexplained residuals: :Y = X\beta + e As is standard with OLS models, we minimize the sum of squared residuals e to get an estimate \hat: : \min_\beta (Y-X\beta)^2 :\Rightarrow X'(Y-X\hat) = 0 :\Rightarrow \hat = (X'X)^ X'Y From there, we can derive the classic "sandwich" estimator: :V(\hat) = V((X'X)^X'Y) = V(\beta + (X'X)^X'e) = V((X'X)^X'e) = (X'X)^X'ee'X(X'X)^ Denoting \Omega \equiv ee' yields a potentially more familiar form :V(\hat) = (X'X)^X'\Omega X (X'X)^ While one can develop a plug-in estimator by defining \hat \equiv Y - X\hat and letting \hat \equiv \hat \hat', this completely flexible estimator will ''not'' converge to V() as N \rightarrow \infty. Given the assumptions that a practitioner deems as reasonable, different types of standard errors solve this problem in different ways. For example, classic homoskedastic standard errors assume that \Omega is diagonal with identical elements \sigma^2, which simplifies the expression for V(\hat) = \sigma^2 (X'X)^. Huber-White standard errors assume \Omega is diagonal but that the diagonal value varies, while other types of standard errors (e.g. Newey–West, Moulton SEs, Conley spatial SEs) make other restrictions on the form of this matrix to reduce the number of parameters that the practitioner needs to estimate. Clustered standard errors assume that \Omega is block-diagonal according to the clusters in the sample, with unrestricted values in each block but zeros elsewhere. In this case, one can define X_c and \Omega_c as the within-block analogues of X and \Omega and derive the following mathematical fact: :X'\Omega X = \sum_c X'_c \Omega_c X_c By constructing plug-in matrices \hat_c, one can form an estimator for V(\hat) that is consistent as the number of clusters c becomes large. While no specific number of clusters is statistically proven to be sufficient, practitioners often cite a number in the range of 30-50 and are comfortable using clustered standard errors when the number of clusters exceeds that threshold.


Further reading

* Alberto Abadie, Susan Athey, Guido W Imbens, and Jeffrey M Wooldridge. 2022.
When Should You Adjust Standard Errors for Clustering?
''Quarterly Journal of Economics''.


References

{{Reflist Statistical deviation and dispersion Statistical analysis