In
statistical theory, the field of high-dimensional statistics studies data whose
dimension is larger than typically considered in classical
multivariate analysis. The area arose owing to the emergence of many modern data sets in which the dimension of the data vectors may be comparable to, or even larger than, the
sample size, so that justification for the use of traditional techniques, often based on asymptotic arguments with the dimension held fixed as the sample size increased, was lacking.
Examples
Parameter estimation in linear models

The most basic statistical model for the relationship between a
covariate vector
and a
response variable is the
linear model
:
where
is an unknown parameter vector, and
is random noise with mean zero and variance
. Given independent responses
, with corresponding covariates
, from this model, we can form the response vector
, and
design matrix . When
and the design matrix has full
column rank (i.e. its columns are
linearly independent), the
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
estimator of
is
:
When
, it is
known
Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distinc ...
that
. Thus,
is an
unbiased estimator of
, and the
Gauss-Markov theorem tells us that it is the
Best Linear Unbiased Estimator.
However,
overfitting
mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
is a concern when
is of comparable magnitude to
: the matrix
in the definition of
may become
ill-conditioned, with a small minimum
eigenvalue. In such circumstances
will be large (since the
trace of a matrix is the sum of its eigenvalues). Even worse, when
, the matrix
is
singular. (See Section 1.2 and Exercise 1.2 in
.)
It is important to note that the deterioration in estimation performance in high dimensions observed in the previous paragraph is not limited to the ordinary least squares estimator. In fact, statistical inference in high dimensions is intrinsically hard, a phenomenon known as the
curse of dimensionality, and it can be shown that no estimator can do better in a worst-case sense without additional information (see Example 15.10
). Nevertheless, the situation in high-dimensional statistics may not be hopeless when the data possess some low-dimensional structure. One common assumption for high-dimensional linear regression is that the vector of regression coefficients is
sparse, in the sense that most coordinates of
are zero. Many statistical procedures, including the
Lasso, have been proposed to fit high-dimensional linear models under such sparsity assumptions.
Covariance matrix estimation
Another example of a high-dimensional statistical phenomenon can be found in the problem of
covariance matrix estimation. Suppose that we observe
, which are
i.i.d. draws from some zero mean distribution with an unknown covariance matrix
. A natural
unbiased estimator of
is the
sample covariance matrix
:
In the low-dimensional setting where
increases and
is held fixed,
is a
consistent estimator of
in any
matrix norm. When
grows with
, on the other hand, this consistency result may fail to hold. As an illustration, suppose that each
and
. If
were to consistently estimate
, then the eigenvalues of
should approach one as
increases. It turns out that this is not the case in this high-dimensional setting. Indeed, the largest and smallest eigenvalues of
concentrate around
and
, respectively, according to the limiting distribution derived by
Tracy and Widom, and these clearly deviate from the unit eigenvalues of
. Further information on the asymptotic behaviour of the eigenvalues of
can be obtained from the
Marchenko–Pastur law. From a non-asymptotic point of view, the maximum eigenvalue
of
satisfies
:
for any
and all choices of pairs of
.
Again, additional low-dimensional structure is needed for successful covariance matrix estimation in high dimensions. Examples of such structures include
sparsity,
low rankness and
bandedness. Similar remarks apply when estimating an inverse covariance matrix
(precision matrix).
History
From an applied perspective, research in high-dimensional statistics was motivated by the realisation that advances in computing technology had dramatically increased the ability to collect and store
data, and that traditional statistical techniques such as those described in the examples above were often ill-equipped to handle the resulting challenges. Theoretical advances in the area can be traced back to the remarkable result of
Charles Stein in 1956,
where he proved that the usual estimator of a multivariate normal mean was
inadmissible with respect to squared error loss in three or more dimensions. Indeed, the
James-Stein estimator provided the insight that in high-dimensional settings, one may obtain improved estimation performance through shrinkage, which reduces variance at the expense of introducing a small amount of bias. This
bias-variance tradeoff was further exploited in the context of high-dimensional
linear models
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the ter ...
by Hoerl and Kennard in 1970 with the introduction of
ridge regression. Another major impetus for the field was provided by
Robert Tibshirani's work on the
Lasso in 1996, which used
regularisation to achieve simultaneous model selection and parameter estimation in high-dimensional sparse linear regression.
Since then, a large number of other
shrinkage estimators have been proposed to exploit different low-dimensional structures in a wide range of high-dimensional statistical problems.
Topics in high-dimensional statistics
The following are examples of topics that have received considerable attention in the high-dimensional statistics literature in recent years:
* Linear models in high dimensions. Linear models are one of the most widely used tools in statistics and its applications. As such, sparse linear regression is one of the most well-studied topics in high-dimensional statistical research. Building upon earlier works on
ridge regression and the
Lasso, several other
shrinkage estimators have been proposed and studied in this and related problems. They include
** The Dantzig selector, which minimises the maximum covariate-residual correlation, instead of the residual sum of squares as in the Lasso, subject to an
constraint on the coefficients.
**
Elastic net, which combines
regularisation of the
Lasso with
regularisation of
ridge regression to allow highly correlated covariates to be simultaneously selected with similar regression coefficients.
** The
Group Lasso, which allows predefined groups of covariates to be selected jointly.
** The
Fused lasso, which regularises the difference between nearby coefficients when the regression coefficients reflect spatial or temporal relationships, so as to enforce a piecewise constant structure.
[Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. “Sparsity and Smoothness via the Fused lasso”. Journal of the Royal Statistical Society. Series B (statistical Methodology) 67 (1). Wiley: 91–108. https://www.jstor.org/stable/3647602.]
*
High-dimensional variable selection. In addition to estimating the underlying parameter in regression models, another important topic is to seek to identify the non-zero coefficients, as these correspond to variables that are needed in a final model. Each of the techniques listed under the previous heading can be used for this purpose, and are sometimes combined with ideas such as
subsampling through Stability Selection.
* High-dimensional covariance and precision matrix estimation. These problems were introduced above; see also
shrinkage estimation. Methods include tapering estimators and the constrained
minimisation estimator.
*
Sparse principal component analysis.
Principal Component Analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
is another technique that breaks down in high dimensions; more precisely, under appropriate conditions, the leading eigenvector of the sample covariance matrix is an inconsistent estimator of its population counterpart when the ratio of the number of variables
to the number of observations
is bounded away from zero. Under the assumption that this leading eigenvector is sparse (which can aid interpretability), consistency can be restored.
*
Matrix completion. This topic, which concerns the task of filling in the missing entries of a partially observed matrix, became popular owing in large part to the
Netflix prize for predicting user ratings for films.
* High-dimensional classification.
Linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
cannot be used when
, because the sample covariance matrix is
singular. Alternative approaches have been proposed based on
naive Bayes, feature selection and
random projections.
*
Graphical models for high-dimensional data. Graphical models are used to encode the conditional dependence structure between different variables. Under a Gaussianity assumption, the problem reduces to that of estimating a sparse precision matrix, discussed above.
Notes
References
*
*
*
*
*
{{statistics, state=collapsed
Multivariate statistics
Probability theory
Functional analysis