
L1-norm principal component analysis (L1-PCA) is a general method for multivariate data analysis.
L1-PCA is often preferred over standard L2-norm
principal component analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA) when the analyzed data may contain
outliers
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter a ...
(faulty values or corruptions).
Both L1-PCA and standard PCA seek a collection of
orthogonal
In mathematics, orthogonality is the generalization of the geometric notion of '' perpendicularity''.
By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...
directions (principal components) that define a
subspace wherein data representation is maximized according to the selected criterion.
Standard PCA quantifies data representation as the aggregate of the
L2-norm
In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
of the data point
projections into the subspace, or equivalently the aggregate
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...
of the original points from their subspace-projected representations.
L1-PCA uses instead the aggregate of the L1-norm of the data point projections into the subspace.
In PCA and L1-PCA, the number of principal components (PCs) is lower than the
rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...
of the analyzed matrix, which coincides with the dimensionality of the space defined by the original data points.
Therefore, PCA or L1-PCA are commonly employed for
dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
for the purpose of data denoising or compression.
Among the advantages of standard PCA that contributed to its high popularity are
low-cost computational implementation by means of
singular-value decomposition
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is r ...
(SVD) and statistical optimality when the data set is generated by a true
multivariate normal
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One ...
data source.
However, in modern big data sets, data often include corrupted, faulty points, commonly referred to as outliers.
Standard PCA is known to be sensitive to outliers, even when they appear as a small fraction of the processed data.
The reason is that the L2-norm formulation of L2-PCA places squared emphasis on the magnitude of each coordinate of each data point, ultimately overemphasizing peripheral points, such as outliers.
On the other hand, following an L1-norm formulation, L1-PCA places linear emphasis on the coordinates of each data point, effectively restraining outliers.
Formulation
Consider any
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
consisting of
-dimensional data points. Define
. For integer
such that
, L1-PCA is formulated as:
For
, () simplifies to finding the L1-norm principal component (L1-PC) of
by
In ()-(),
L1-norm
In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki ...
returns the sum of the absolute entries of its argument and L2-norm
returns the sum of the squared entries of its argument. If one substitutes
in () by the
Frobenius/L2-norm
, then the problem becomes standard PCA and it is solved by the matrix
that contains the
dominant singular vectors of
(i.e., the singular vectors that correspond to the
highest
singular values).
The maximization metric in () can be expanded as
Solution
For any matrix
with
, define
as the nearest (in the L2-norm sense) matrix to
that has orthonormal columns. That is, define
Procrustes Theorem states that if
has SVD
, then
.
Markopoulos, Karystinos, and Pados
showed that, if
is the exact solution to the binary nuclear-norm maximization (BNM) problem
then
is the exact solution to L1-PCA in (). The
nuclear-norm in () returns the summation of the singular values of its matrix argument and can be calculated by means of standard SVD. Moreover, it holds that, given the solution to L1-PCA,
, the solution to BNM can be obtained as
where
returns the
-sign matrix of its matrix argument (with no loss of generality, we can consider
). In addition, it follows that
. BNM in () is a
combinatorial
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ap ...
problem over antipodal binary variables. Therefore, its exact solution can be found through exhaustive evaluation of all
elements of its feasibility set, with
asymptotic cost . Therefore, L1-PCA can also be solved, through BNM, with cost
(exponential in the product of the number of data points with the number of the sought-after components). It turns out that L1-PCA can be solved optimally (exactly) with polynomial complexity in
for fixed data dimension
'',
''.
For the special case of
(single L1-PC of
), BNM takes the binary-quadratic-maximization (BQM) form
The transition from () to () for
holds true, since the unique singular value of
is equal to
, for every
. Then, if
is the solution to BQM in (), it holds that
is the exact L1-PC of
, as defined in (). In addition, it holds that
and
.
Algorithms
Exact solution of exponential complexity
As shown above, the exact solution to L1-PCA can be obtained by the following two-step process:
1. Solve the problem in () to obtain
.
2. Apply SVD on
to obtain
.
BNM in () can be solved by exhaustive search over the domain of
with cost
.
Exact solution of polynomial complexity
Also, L1-PCA can be solved optimally with cost
, when
is constant with respect to
(always true for finite data dimension
).
Approximate efficient solvers
In 2008, Kwak
proposed an iterative algorithm for the approximate solution of L1-PCA for
. This iterative method was later generalized for
components.
Another approximate efficient solver was proposed by McCoy and Tropp by means of semi-definite programming (SDP). Most recently, L1-PCA (and BNM in ()) were solved efficiently by means of bit-flipping iterations (L1-BF algorithm).
L1-BF algorithm
1 function L1BF(
,
):
2 Initialize
and
3 Set
and
4 Until termination (or
iterations)
5
,
6 For
7
,
8
''// flip bit''
9
''// calculated by SVD or faster (see
)''
10 if
11
,
12
13 end
14 if
''// no bit was flipped''
15 if
16 terminate
17 else
18
The computational cost of L1-BF is
.
Complex data
L1-PCA has also been generalized to process
complex
Complex commonly refers to:
* Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe
** Complex system, a system composed of many components which may interact with each ...
data. For complex L1-PCA, two efficient algorithms were proposed in 2018.
Tensor data
L1-PCA has also been extended for the analysis of
tensor
In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tens ...
data, in the form of L1-Tucker, the L1-norm robust analogous of standard
Tucker decomposition
In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker
although it goes back to Hitchcock in 1927.
Initially described as a three-mode extension of factor an ...
.
Two algorithms for the solution of L1-Tucker are L1-HOSVD and L1-HOOI.
Code
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
code for L1-PCA is available at MathWorks
and other repositories.
References
{{reflist
Data analysis