
Biplots are a type of exploratory graph used in
statistics, a generalization of the simple two-variable
scatterplot
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
.
A biplot overlays a ''score plot'' with a ''loading plot''.
A biplot allows information on both
samples and variables of a
data matrix
A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is from ...
to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear
axes
Axes, plural of '' axe'' and of '' axis'', may refer to
* ''Axes'' (album), a 2005 rock album by the British band Electrelane
* a possibly still empty plot (graphics)
A plot is a graphical technique for representing a data set, usually as a g ...
or nonlinear trajectories. In the case of categorical variables, ''category level points'' may be used to represent the levels of a categorical variable. A ''generalised'' biplot displays information on both continuous and categorical variables.
Introduction and history
The biplot was introduced by
K. Ruben Gabriel (1971). Gower and Hand (1996) wrote a monograph on biplots. Yan and Kang (2003) described various methods which can be used in order to visualize and interpret a biplot. The book by Greenacre (2010) is a practical user-oriented guide to biplots, along with scripts in the open-source
R programming language
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinforma ...
, to generate biplots associated with
principal component analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA),
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
(MDS), log-ratio analysis (LRA)—also known as spectral mapping—
discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
(DA) and various forms of
correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rat ...
: simple correspondence analysis (CA), multiple correspondence analysis (MCA) and canonical correspondence analysis (CCA) (Greenacre 2016
[ Greenacre, M. (2016) ''Correspondence Analysis in Practice. Third Edition''. Chapman and Hall / CRC Press.]). The book by Gower, Lubbe and le Roux (2011) aims to popularize biplots as a useful and reliable method for the visualization of multivariate data when researchers want to consider, for example, principal component analysis (PCA), canonical variates analysis (CVA) or various types of correspondence analysis.
Construction
A biplot is constructed by using the
singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is r ...
(SVD) to obtain a
low-rank approximation to a transformed version of the data matrix X, whose ''n'' rows are the samples (also called the cases, or objects), and whose ''p'' columns are the variables. The transformed data matrix Y is obtained from the original matrix X by centering and optionally standardizing the columns (the variables). Using the SVD, we can write Y = Σ
''k''=1,...''p''''d''
''k''u
''k''v
''k''T;, where the u
''k'' are ''n''-dimensional column vectors, the v
''k'' are ''p''-dimensional column vectors, and the ''d''
''k'' are a non-increasing sequence of non-negative
scalars
Scalar may refer to:
*Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers
*Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
. The biplot is formed from two scatterplots that share a common set of axes and have a between-set
scalar product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...
interpretation. The first scatterplot is formed from the points (''d''
1α''u''
1''i'', ''d''
2α''u''
2''i''), for ''i'' = 1,...,''n''. The second plot is formed from the points (''d''
11−α''v''
1''j'', ''d''
21−α''v''
2''j''), for ''j'' = 1,...,''p''. This is the biplot formed by the dominant two terms of the SVD, which can then be represented in a two-dimensional display. Typical choices of α are 1 (to give a distance interpretation to the row display) and 0 (to give a distance interpretation to the column display), and in some rare cases α=1/2 to obtain a symmetrically scaled biplot (which gives no distance interpretation to the rows or the columns, but only the scalar product interpretation). The set of points depicting the variables can be drawn as arrows from the origin to reinforce the idea that they represent biplot axes onto which the samples can be projected to approximate the original data.
References
Sources
*
* Gower, J.C., Lubbe, S. and le Roux, N. (2010). ''Understanding Biplots''.
Wiley
Wiley may refer to:
Locations
* Wiley, Colorado, a U.S. town
* Wiley, Pleasants County, West Virginia, U.S.
* Wiley-Kaserne, a district of the city of Neu-Ulm, Germany
People
* Wiley (musician), British grime MC, rapper, and producer
* Wiley Mi ...
.
* Gower, J.C. and Hand, D.J (1996). ''Biplots''.
Chapman & Hall
Chapman & Hall is an Imprint (trade name), imprint owned by CRC Press, originally founded as a United Kingdom, British publishing house in London in the first half of the 19th century by Edward Chapman (publisher), Edward Chapman and William Hall ...
, London, UK.
* Yan, W. and Kang, M.S. (2003). ''GGE Biplot Analysis''.
CRC Press
The CRC Press, LLC is an American publishing group that specializes in producing technical books. Many of their books relate to engineering, science and mathematics. Their scope also includes books on business, forensics and information tec ...
, Boca Raton, Florida.
* Demey, J.R., Vicente-Villardón, J.L., Galindo-Villardón, M.P. and Zambrano, A.Y. (2008). ''Identifying molecular markers associated with classification of genotypes by External Logistic Biplots''.
Bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
. 24(24):2832–2838
{{Statistics, descriptive
Statistical charts and diagrams
Factor analysis