Within
computational biology
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
, an MA plot is an application of a
Bland–Altman plot
A Bland–Altman plot (difference plot) in analytical chemistry or biomedicine is a method of data plotting used in analyzing the agreement between two different assays. It is identical to a Tukey mean-difference plot, the name by which it is ...
for visual representation of
genomic
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M (log ratio) and A (
mean average) scales, then plotting these values. Though originally applied in the context of two channel
DNA microarray
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
gene expression data, MA plots are also used to visualise
high-throughput sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
analysis.
Explanation
Microarray
A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silic ...
data is often normalized within arrays to control for systematic biases in dye coupling and hybridization efficiencies, as well as other technical biases in the DNA probes and the print tip used to spot the array. By minimizing these systematic variations, true biological differences can be found. To determine whether normalization is needed, one can plot
Cy5
Cyanines, also referred to as tetramethylindo(di)-carbocyanines are a synthetic dye family belonging to the polymethine group. Although the name derives etymologically from terms for shades of blue, the cyanine family covers the electromagnetic ...
(R) intensities against
Cy3 (G) intensities and see whether the slope of the line is around 1. An improved method, which is basically a scaled, 45 degree rotation of the R vs. G plot is an MA-plot.
[ Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. ''Stat. Sin.'' 12:1 111–139] The MA-plot is a plot of the distribution of the red/green intensity ratio ('M') plotted by the average intensity ('A'). M and A are defined by the following equations.
:
:
M is, therefore, the
binary logarithm
In mathematics, the binary logarithm () is the power to which the number must be raised to obtain the value . That is, for any real number ,
:x=\log_2 n \quad\Longleftrightarrow\quad 2^x=n.
For example, the binary logarithm of is , the ...
of the intensity ratio (or difference between log intensities) and A is the average log intensity for a dot in the plot. MA plots are then used to visualize intensity-dependent ratio of raw microarray data (microarrays typically show a bias here, with higher A resulting in higher , M, , i.e. the brighter the spot the more likely an observed difference between sample and control). The MA plot puts the variable ''M'' on the ''y''-axis and ''A'' on the ''x''-axis and gives a quick overview of the
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
of the data.
In many microarray gene expression experiments, an underlying assumption is that most of the genes would not see any change in their expression; therefore, the majority of the points on the ''y''-axis (''M'') would be located at 0, since log(1) is 0. If this is not the case, then a
normalization
Normalization or normalisation refers to a process that makes something more normal or regular. Most commonly it refers to:
* Normalization (sociology) or social normalization, the process through which ideas and behaviors that may fall outside of ...
method such as
LOESS should be applied to the data before statistical analysis. (On the diagram below see the red line running below the zero mark before normalization, it should be straight. Since it is not straight, the data should be normalized. After being normalized, the red line is straight on the zero line and shows as pink/black.)
Packages
Several
Bioconductor
Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.
Bioconductor is based primarily on the statistical R programming ...
packages, for the
R software, provide the facility for creating MA plots. These include affy (ma.plot, mva.pairs), limma (plotMA), marray (maPlot), and edgeR(maPlot)
Similar
"RA" plots can be generated using the raPlot function in the carolin
CRAN R package.
An interactive MA plot to filter genes by M, A and p-values, search by names or with a lasso, and save selected genes, is available as an
R-Shiny cod
Enhanced-MA-Plot
Example in the R programming language
library(affy)
if (require(affydata))
y <- (exprs(Dilution) c("20B", "10A")
x11()
ma.plot( rowMeans(log2(y)), log2(y 1-log2(y 2, cex=1 )
title("Dilutions Dataset (array 20B v 10A)")
library(preprocessCore)
#do a quantile normalization
x <- normalize.quantiles(y)
x11()
ma.plot( rowMeans(log2(x)), log2(x 1-log2(x 2, cex=1 )
title("Post Norm: Dilutions Dataset (array 20B v 10A)")
{{Clear
See also
*
RA plot The ratio average (RA) plot is an integer-based version of an MA plot for visualizing two-condition count data. Its distinctive arrow-like shape derives from the way it includes condition-unique (0,''n'') or (''n'',0) points into the plot via an ...
*
Bland–Altman plot
A Bland–Altman plot (difference plot) in analytical chemistry or biomedicine is a method of data plotting used in analyzing the agreement between two different assays. It is identical to a Tukey mean-difference plot, the name by which it is ...
References
Gene expression
Microarrays
Plots (graphics)