Biplot
   HOME

TheInfoList



OR:

Biplots are a type of exploratory graph used in
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, a generalization of the simple two-variable scatterplot. A biplot overlays a ''score plot'' with a ''loading plot''. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, ''category level points'' may be used to represent the levels of a categorical variable. A ''generalised'' biplot displays information on both continuous and categorical variables.


Introduction and history

The biplot was introduced by
K. Ruben Gabriel Kuno Ruben Gabriel (1929–2003) was a statistician known for the inventing the biplot and the Gabriel graph. See in particulapp. 273–274 and for his work in statistical meteorology.
(1971). Gower and Hand (1996) wrote a monograph on biplots. Yan and Kang (2003) described various methods which can be used in order to visualize and interpret a biplot. The book by Greenacre (2010) is a practical user-oriented guide to biplots, along with scripts in the open-source R programming language, to generate biplots associated with
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA), multidimensional scaling (MDS), log-ratio analysis (LRA)—also known as spectral mapping— discriminant analysis (DA) and various forms of
correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rath ...
: simple correspondence analysis (CA), multiple correspondence analysis (MCA) and canonical correspondence analysis (CCA) (Greenacre 2016 Greenacre, M. (2016) ''Correspondence Analysis in Practice. Third Edition''. Chapman and Hall / CRC Press.). The book by Gower, Lubbe and le Roux (2011) aims to popularize biplots as a useful and reliable method for the visualization of multivariate data when researchers want to consider, for example, principal component analysis (PCA), canonical variates analysis (CVA) or various types of correspondence analysis.


Construction

A biplot is constructed by using the
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...
(SVD) to obtain a low-rank approximation to a transformed version of the data matrix X, whose ''n'' rows are the samples (also called the cases, or objects), and whose ''p'' columns are the variables. The transformed data matrix Y is obtained from the original matrix X by centering and optionally standardizing the columns (the variables). Using the SVD, we can write Y = Σ''k''=1,...''p''''d''''k''u''k''v''k''T;, where the u''k'' are ''n''-dimensional column vectors, the v''k'' are ''p''-dimensional column vectors, and the ''d''''k'' are a non-increasing sequence of non-negative
scalars Scalar may refer to: * Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers * Scalar (physics), a physical quantity that can be described by a single element of a number field such ...
. The biplot is formed from two scatterplots that share a common set of axes and have a between-set scalar product interpretation. The first scatterplot is formed from the points (''d''1α''u''1''i'',  ''d''2α''u''2''i''), for ''i'' = 1,...,''n''. The second plot is formed from the points (''d''11−α''v''1''j'', ''d''21−α''v''2''j''), for ''j'' = 1,...,''p''. This is the biplot formed by the dominant two terms of the SVD, which can then be represented in a two-dimensional display. Typical choices of α are 1 (to give a distance interpretation to the row display) and 0 (to give a distance interpretation to the column display), and in some rare cases α=1/2 to obtain a symmetrically scaled biplot (which gives no distance interpretation to the rows or the columns, but only the scalar product interpretation). The set of points depicting the variables can be drawn as arrows from the origin to reinforce the idea that they represent biplot axes onto which the samples can be projected to approximate the original data.


References


Sources

* * Gower, J.C., Lubbe, S. and le Roux, N. (2010). ''Understanding Biplots''. Wiley. * Gower, J.C. and Hand, D.J (1996). ''Biplots''. Chapman & Hall, London, UK. * Yan, W. and Kang, M.S. (2003). ''GGE Biplot Analysis''.
CRC Press The CRC Press, LLC is an American publishing group that specializes in producing technical books. Many of their books relate to engineering, science and mathematics. Their scope also includes books on business, forensics and information techn ...
, Boca Raton, Florida. * Demey, J.R., Vicente-Villardón, J.L., Galindo-Villardón, M.P. and Zambrano, A.Y. (2008). ''Identifying molecular markers associated with classification of genotypes by External Logistic Biplots''.
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
. 24(24):2832–2838 {{Statistics, descriptive Statistical charts and diagrams Factor analysis