HOME

TheInfoList



OR:

The Gene Ontology (GO) is a major
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
initiative to unify the representation of
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
and
gene product A gene product is the biochemical material, either RNA or protein, resulting from the expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be corre ...
attributes across all
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
. More specifically, the project aims to: 1) maintain and develop its
controlled vocabulary A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled v ...
of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry. Whereas
gene nomenclature Gene nomenclature is the scientific naming of genes, the units of heredity in living organisms. It is also closely associated with protein nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international co ...
focuses on gene and gene products, the Gene Ontology focuses on the function of the genes and gene products. The GO also extends the effort by using a
markup language A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
to make the data (not only of the genes and their products but also of curated attributes) machine readable, and to do so in a way that is unified across all species (whereas gene nomenclature conventions vary by biological
taxon In biology, a taxon (back-formation from ''taxonomy''; : taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular name and ...
).


History

The Gene Ontology was originally constructed in 1998 by a consortium of researchers studying the
genomes A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
of three
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
s: ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
'' (fruit fly), ''
Mus musculus The house mouse (''Mus musculus'') is a small mammal of the rodent family Muridae, characteristically having a pointed snout, large rounded ears, and a long and almost hairless tail. It is one of the most abundant species of the genus ''Mus (genu ...
'' (mouse), and ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have be ...
'' (brewer's or baker's yeast). Many other
Model Organism Databases Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
have joined the Gene Ontology Consortium, contributing not only to annotation data, but also to the development of ontologies and tools to view and apply the data. Many major plant, animal, and microorganism databases make a contribution towards this project. As of July 2019, the GO contains 44,945 terms; there are 6,408,283 annotations to 4,467 different biological organisms. There is a significant body of literature on the development and use of the GO, and it has become a standard tool in the
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
arsenal. Their objectives have three aspects: building gene ontology, assigning ontology to gene/gene products, and developing software and databases for the first two objects. Several analyses of the Gene Ontology using formal, domain-independent properties of classes (the metaproperties) are also starting to appear. For instance, there is now an ontological analysis of biological ontologies.


Terms and ontology

From a practical view, an ontology is a representation of something we know about. "Ontologies" consist of representations of things that are detectable or directly observable and the relationships between those things. There is no universal standard terminology in biology and related domains, and term usage may be specific to a species, research area, or even a particular research group. This makes communication and sharing of data more difficult. The Gene Ontology project provides an
ontology Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
of defined terms representing
gene product A gene product is the biochemical material, either RNA or protein, resulting from the expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be corre ...
properties. The ontology covers three domains: * cellular component, the parts of a cell or its
extracellular This glossary of biology terms is a list of definitions of fundamental terms and concepts used in biology, the study of life and of living organisms. It is intended as introductory material for novices; for more specific and technical definitions ...
environment; * molecular function, the elemental activities of a gene product at the molecular level, such as binding or
catalysis Catalysis () is the increase in rate of a chemical reaction due to an added substance known as a catalyst (). Catalysts are not consumed by the reaction and remain unchanged after it. If the reaction is rapid and the catalyst recycles quick ...
; *
biological process Biological processes are those processes that are necessary for an organism to live and that shape its capacities for interacting with its environment. Biological processes are made of many chemical reactions or other events that are involved in ...
, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and
organism An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
s. Each GO term within the ontology has a term name, which may be a word or string of words; a unique alphanumeric identifier; a definition with cited sources; and an ontology indicating the domain to which it belongs. Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related; references to equivalent concepts in other databases; and comments on term meaning or usage. The GO is structured as a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...
, and each term has defined relationships to one or more other terms in the same domain, and sometimes to other domains. The GO vocabulary is designed to be species-neutral and includes terms applicable to
prokaryote A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s and
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, single and
multicellular organism A multicellular organism is an organism that consists of more than one cell (biology), cell, unlike unicellular organisms. All species of animals, Embryophyte, land plants and most fungi are multicellular, as are many algae, whereas a few organism ...
s. GO is not static, and additions, corrections, and alterations are suggested by and solicited from members of the research and annotation communities, as well as by those directly involved in the GO project. For example, an annotator may request a specific term to represent a metabolic pathway, or a section of the ontology may be revised with the help of community experts (e.g.). Suggested edits are reviewed by the ontology editors, and implemented where appropriate. The GO and annotation files are freely available from the GO website in a number of formats or can be accessed online using the GO browser AmiGO. The Gene Ontology project also provides downloadable mappings of its terms to other classification systems.


Example term

:id: GO:0000016 :name: lactase activity :ontology: molecular_function :def: "Catalysis of the reaction: lactose + H2O=D-glucose + D-galactose." C:3.2.1.108:synonym: "lactase-phlorizin hydrolase activity" BROAD C:3.2.1.108:synonym: "lactose galactohydrolase activity" EXACT C:3.2.1.108:xref: EC:3.2.1.108 :xref: MetaCyc:LACTASE-RXN :xref: Reactome:20536 :is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl compounds Data source:


Annotation

Genome annotation encompasses the practice of capturing data about a gene product, and GO annotations use terms from the GO to do so. Annotations from GO curators are integrated and disseminated on the GO website, where they can be downloaded directly or viewed online using AmiGO. In addition to the gene product identifier and the relevant GO term, GO annotations have at least the following data: The ''reference'' used to make the annotation (e.g. a journal article); An ''evidence code'' denoting the type of evidence upon which the annotation is based; The date and the creator of the annotation Supporting information, depending on the GO term and evidence used, and supplementary information, such as the conditions the function is observed under, may also be included in a GO annotation. The evidence code comes from a
controlled vocabulary A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled v ...
of codes, the Evidence Code Ontology, covering both manual and automated annotation methods. For example, ''Traceable Author Statement'' (TAS) means a curator has read a published scientific paper and the metadata for that annotation bears a citation to that paper; ''Inferred from Sequence Similarity'' (ISS) means a human curator has reviewed the output from a sequence similarity search and verified that it is biologically meaningful. Annotations from automated processes (for example, remapping annotations created using another annotation vocabulary) are given the code ''Inferred from Electronic Annotation'' (IEA). In 2010, over 98% of all GO annotations were inferred computationally, not by curators, but as of July 2, 2019, only about 30% of all GO annotations were inferred computationally. As these annotations are not checked by a human, the GO Consortium considers them to be marginally less reliable and they are commonly to a higher level, less detailed terms. Full annotation data sets can be downloaded from the GO website. To support the development of annotation, the GO Consortium provides workshops and mentors new groups of curators and developers. Many
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms have been designed and implemented to predict Gene Ontology annotations.


Example annotation

:Gene product: Actin, alpha cardiac muscle 1
UniProtKB:P68032
:GO term:
heart contraction; GO:0060047
(biological process) :Evidence code: Inferred from Mutant Phenotype (IMP) :Reference: :Assigned by: UniProtKB, June 6, 2008 Data source:


Tools

There are a large number of tools available, both online and for download, that use the data provided by the GO project. The vast majority of these come from third parties; the GO Consortium develops and supports two tools, AmiGO and OBO-Edit. AmiGOAmiGO
-the current official web-based set of tools for searching and browsing the Gene Ontology database
is a web-based application that allows users to query, browse, and visualize ontologies and gene product annotation data. It also has a BLAST tool, tools allowing analysis of larger data sets, and an interface to query the GO database directly. AmiGO can be used online at the GO website to access the data provided by the GO Consortium or downloaded and installed for local use on any database employing the GO database schema (e.g.). It is free
open source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
and is available as part of the go-dev software distribution. OBO-Edit is an open source, platform-independent ontology editor developed and maintained by the Gene Ontology Consortium. It is implemented in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and uses a graph-oriented approach to display and edit ontologies. OBO-Edit includes a comprehensive search and filter interface, with the option to render subsets of terms to make them visually distinct; the user interface can also be customized according to user preferences. OBO-Edit also has a
reasoner A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine ...
that can infer links that have not been explicitly stated based on existing relationships and their properties. Although it was developed for biomedical ontologies, OBO-Edit can be used to view, search, and edit any ontology. It is freely available to download.


Consortium

The Gene Ontology Consortium is the set of
biological database Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
s and research groups actively involved in the gene ontology project. This includes a number of
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
databases and multi-species protein databases, software development groups, and a dedicated editorial office.


See also

* Blast2GO * Comparative Toxicogenomics Database * DAVID bioinformatics * Interferome * National Center for Biomedical Ontology * Critical Assessment of Function Annotation


References


External links


AmiGO
- the current official web-based set of tools for searching and browsing the Gene Ontology database
Gene Ontology Consortium
- official site
PlantRegMap - GO annotation for 165 plant species and GO enrichment Analysis
{{Bioinformatics Biological databases Ontology (information science)