List Of Biological Databases
   HOME

TheInfoList



OR:

Biological databases Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
are stores of biological information. The journal ''
Nucleic Acids Research ''Nucleic Acids Research'' is an open-access peer-reviewed scientific journal published since 1974 by the Oxford University Press. The journal covers research on nucleic acids, such as DNA and RNA, and related work. According to the ''Journal Cita ...
'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases
Omics Discovery Index
can be used to browse and search several biological databases. Furthermore, th
NIAID Data Ecosystem Discovery Portal
developed by the National Institute of Allergy and Infectious Diseases (NIAID) enables searching across databases.


Meta databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. Originally, metadata was only a common term referring simply to ''data about data '' such as tags, keywords, and markup headers. *
ConsensusPathDB The ConsensusPathDB is a molecular functional interaction database, integrating information on protein interactions, genetic interactions signaling, metabolism, gene regulation, and drug-target interactions in humans. ConsensusPathDB currently (rel ...
: a molecular functional interaction database, integrating information from 12 others *
Entrez The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
(
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
) *
Neuroscience Information Framework The Neuroscience Information Framework is a repository of global neuroscience web resources, including experimental, clinical, and translational neuroscience databases, knowledge bases, atlases, and genetic/ genomic resources and provides many aut ...
(
University of California, San Diego The University of California, San Diego (UC San Diego in communications material, formerly and colloquially UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California, United States. Es ...
): integrates hundreds of neuroscience relevant resources; many are listed below


Model organism databases

Model organism databases Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large set ...
provide in-depth biological data for intensively studied organisms. * PomBase: the knowledgebase for the fission yeast ''
Schizosaccharomyces pombe ''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically meas ...
'' * ''Subti''Wiki: integrated database for the model bacterium ''
Bacillus subtilis ''Bacillus subtilis'' (), known also as the hay bacillus or grass bacillus, is a gram-positive, catalase-positive bacterium, found in soil and the gastrointestinal tract of ruminants, humans and marine sponges. As a member of the genus ''Bacill ...
'' * TAIR: the knowledgebase for the plant ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
''


Nucleic acid databases


DNA databases

The primary databases make up the International Nucleotide Sequence Database (INSD). The include: *
DNA Data Bank of Japan The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Da ...
( National Institute of Genetics) * EMBL (
European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wel ...
) *
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
(
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
) DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
data from all
organism An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
s. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments. Secondary databases are: *
HapMap The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, diseas ...
*
OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
(Online Mendelian Inheritance in Man): inherited diseases * RefSeq *
1000 Genomes Project The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least o ...
: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
EggNOG Database:
a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Other databases * Nucleosome positioning region database


Gene expression databases

Generic gene expression databases Microarray gene expression databases


Genome databases

These databases collect
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
genome.


Phenotype databases

*
PHI-base The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of Host–pathogen interaction, pathogen-host interactions. The database h ...
: pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer-reviewed literature. * RGD Rat Genome Database: genomic and phenotype data for ''
Rattus norvegicus ''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus. Species and description The best-known ''Rattus'' species are the black rat (''R. rattus' ...
'' * PomBase database: manually curated phenotypic data for the yeast ''
Schizosaccharomyces pombe ''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically meas ...
''


RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
databases

* miRBase: the
microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
database * PolymiRTS: a database of DNA variations in putative microRNA target sites * PolyQ: database of polyglutamine repeats in
disease A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
and non-disease associated proteins * Rfam: a database of RNA families * IRESbase: A comprehensive database of experimentally validated
internal ribosome entry site An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. Initiation of eukaryotic translation nearly always occur ...
s.


Amino acid / protein databases

''(See also: List of proteins in the human body)'' Several publicly available data repositories and resources have been developed to support and manage
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
KB. Most of these databases are cross-referenced with
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
/
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
KB so that identifiers can be mapped to each other. Proteins in human: There are about ~20,000 protein coding genes in the standard human genome. (Roughly ~1200 already have Wikipedia articles - the Gene Wiki - about them) if we are Including splice variants, there could be as many as 500,000 unique human proteins


Different types of Protein databases


Signal transduction pathway databases

* NCI-Nature Pathway Interaction Database * Netpath: curated resource of
signal transduction pathways Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events. Proteins responsible for detecting stimuli are generally termed receptors, although in some cases the term ...
in humans * Reactome: navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling ( Ontario Institute for Cancer Research,
European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wel ...
,
NYU Langone Medical Center NYU Langone Health is an integrated Health system, academic health system located in New York City, New York, United States. The health system consists of the New York University Grossman School of Medicine, NYU Grossman School of Medicine and NY ...
,
Cold Spring Harbor Laboratory Cold Spring Harbor Laboratory (CSHL) is a private, non-profit institution with research programs focusing on cancer, neuroscience, botany, genomics, and quantitative biology. It is located in Laurel Hollow, New York, in Nassau County, on ...
) * WikiPathways


Metabolic pathway and protein function databases


Taxonomic databases

Numerous databases collect information about
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
and other taxonomic categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species. *
BacDive Bac''Dive'' (The Bacterial Diversity Database) is the worldwide largest database for standardized bacterial and archaeal strain-level information. Bac''Dive'' is a comprehensive resource containing diverse data on bacterial and archaeal strains, ...
: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information *
Catalogue of Life The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxono ...
: a meta-database of all species on earth * EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences * NCBI Taxonomy: a taxonomic database operated by
NCBI The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is loca ...
and concentrating on all taxa for which DNA sequences are available (those sequences are stored by
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
, another database operated by NCBI).


Image databases

Images play a critical role in biomedicine, ranging from images of
anthropological Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, societies, and linguistics, in both the present and past, including archaic humans. Social anthropology studies patterns of behaviour, wh ...
specimens to
zoology Zoology ( , ) is the scientific study of animals. Its studies include the anatomy, structure, embryology, Biological classification, classification, Ethology, habits, and distribution of all animals, both living and extinction, extinct, and ...
. However, there are relatively few databases dedicated to image collection, although some projects such as
iNaturalist iNaturalist is an American 501(c)(3) nonprofit social network of naturalists, citizen scientists, and biologists built on the concept of mapping and sharing observations of biodiversity across the globe. iNaturalist may be accessed via its web ...
collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
s or 3D-reconstructions of anatomical structures. Image databases include, among others: * Allen Brain Atlas * Digital Brain Bank * Electron Microscopy Public Image Archive (EMPIAR) * Image Data Resource * Morphobank * Morphosource


Radiologic databases

* The Cancer Imaging Archive (TCIA) * Neuroimaging Informatics Tools and Resources Clearinghouse


Additional databases


Exosomal databases

* ExoCarta * Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids


Mathematical model databases

*
Biomodels Database BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scie ...
: published mathematical models describing biological processes
MorpheusML Model Repository
published, community-contributed, and educational multi-scale and multicellular models for
systems biology Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological system ...


Databases on

antimicrobial resistance Antimicrobial resistance (AMR or AR) occurs when microbes evolve mechanisms that protect them from antimicrobials, which are drugs used to treat infections. This resistance affects all classes of microbes, including bacteria (antibiotic resista ...
rates and antibiotic consumption

* CIPARS * EARS-Net * ESAC-Net


Databases on

antimicrobial resistance Antimicrobial resistance (AMR or AR) occurs when microbes evolve mechanisms that protect them from antimicrobials, which are drugs used to treat infections. This resistance affects all classes of microbes, including bacteria (antibiotic resista ...
mechanisms


Wiki-style databases

* Gene Wiki *
WikiSpecies Wikispecies is a wiki-based online project supported by the Wikimedia Foundation. Its aim is to create a comprehensive open content catalogue of all species; the project is directed at scientists, rather than at the general public. Jimmy Wales s ...
* WikiProfessional


Specialized databases


References


External links


Nucleic Acid Research Molecular Biology Database Collection
– over 1,600 databases
Nucleic Acid Research (NAR) Database Summary Paper Category List
{{Bioinformatics * Da Bio