GO Term Enrichment
   HOME

TheInfoList



OR:

Gene Ontology (GO) term enrichment is a technique for interpreting sets of
genes In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
making use of the
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ...
system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a
receptor Receptor may refer to: * Sensory receptor, in physiology, any neurite structure that, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and respond ...
, involved in
apoptosis Apoptosis (from ) is a form of programmed cell death that occurs in multicellular organisms and in some eukaryotic, single-celled microorganisms such as yeast. Biochemistry, Biochemical events lead to characteristic cell changes (Morphology (biol ...
and located on the
plasma membrane The cell membrane (also known as the plasma membrane or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of a cell from the outside environment (the extr ...
. Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying
biological processes Biological processes are those processes that are necessary for an organism to live and that shape its capacities for interacting with its environment. Biological processes are made of many chemical reactions or other events that are involved in ...
. This can be done by comparing the input gene set with each of the bins (terms) in the GO – a
statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
can be performed for each bin to see if it is enriched for the input genes. The output of the analysis is typically a ranked list of GO terms, each associated with a
p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
.


Background


The Gene Ontology

The Gene Ontology (GO) provides a system for hierarchically classifying genes or gene products into terms organized in a
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...
structure (or an
ontology Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
). The terms are groups into three categories: molecular function (describing the molecular activity of a gene), biological process (describing the larger cellular or physiological role carried out by the gene, coordinated with other genes), and cellular component (describing the location in the
cell Cell most often refers to: * Cell (biology), the functional basic unit of life * Cellphone, a phone connected to a cellular network * Clandestine cell, a penetration-resistant form of a secret or outlawed organization * Electrochemical cell, a de ...
where the gene product executes its function). Each gene can be described (annotated) with multiple terms. The GO is actively used to classify genes from humans,
model organisms A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
and a variety of other species. Using the GO, it is possible to retrieve the set of terms used to describe any gene, or conversely, given a term, return the set of genes annotated to that term. For the latter query, the hierarchical system of the GO is employed to give complete results. For example, a query for the GO term for
nucleus Nucleus (: nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom *Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucleu ...
should return genes annotated to the term "nuclear membrane".


Interpreting high throughput data

Certain types of high-throughput experiments (e.g., RNA seq) return sets of genes that are over- or under-expressed. GO can be used to functionally profile this set of genes and to determine which GO terms appear more frequently than would be expected by chance when examining the set of terms annotated to the input genes. For example, an experiment may compare gene expression in healthy cells versus
cancerous Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
cells. Functional profiling can be used to elucidate the underlying cellular mechanisms associated with the cancerous condition. This is also called term enrichment or term overrepresentation, as we are testing whether a GO term is statistically enriched for the given set of genes.


Methods

There are a variety of methods for performing a term enrichment using GO. Methods may vary according to the type of statistical test applied, the most common being a
Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...
/ hypergeometric test. Some methods make use of Bayesian statistics. There is also variability in the type of correction applied for
Multiple comparisons Multiple comparisons, multiplicity or multiple testing problem occurs in statistics when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values. The larger the numbe ...
, the most common being the
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. Application of the method to confidence intervals was described by ...
. Methods also vary in their input – some take unranked gene sets, others take ranked gene sets, with more sophisticated methods allowing each gene to be associated with a magnitude (e.g., expression level), avoiding arbitrary cutoffs.


Tools


MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses

PlantRegMap: GO annotation for 165 species and GO term enrichment analysisPLAZA
Workbench: GO, InterPro and MapMan enrichment analysis for different plant species. * The Gene Ontology Consortium (GOC) provides a Term Enrichment tool.
Term Enrichment

FunRich
ref name="pmid25921073"> is a Windows-based, free, standalone functional enrichment analysis tool. * Blast2GO, is a platform-independent desktop application to perform functional enrichment analysis as well as functional annotation of novel sequence data.


References

{{Reflist Bioinformatics Ontology (information science)