The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s containing
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
and
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
sequences. It involves the following computerized
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s:
NIG's
DNA Data Bank of Japan (
Japan
Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
),
NCBI's
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
(
USA) and the
EMBL-
EBI's
European Nucleotide Archive (
EMBL). New and updated data on
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
sequences contributed by research teams to each of the three databases are synchronized on a daily basis through continuous interaction between the staff at each the collaborating organizations.
All of the data in INSDC is available for free and unrestricted access, for any purpose, with no restrictions on analysis, redistribution, or re-publication of the data. This policy has been a foundational principle of the INSDC since its inception. Since the 1990s, most of the world's major scientific journals have required that sequence data be deposited in an INSDC database as a pre-condition for publication.
The
DDBJ/
EMBL-EBI/
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
synchronization is maintained according to a number of guidelines which are produced and published by an International Advisory Board. The guidelines consist of a common definition of the feature tables for the databases, which regulate the content and
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
of the database entries, in the form of a common
DTD (''Document Type Definition'').
The syntax is called INSDSeq and its core consists of the letter sequence of the
gene expression
Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
(
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
sequence) and the letter sequence for
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
bases in the gene or decoded segment. In a DBFetch operation shows a typical INSD entry at the EMBL-EBI database; the same entry at NCBI.
See also
*
Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
*
Biological database
*
List of biological databases
*
National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
*
Sequence database
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("Digital data, digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a ...
References
External links
Official site
External links
*
DNA Data Bank of JapanGenBank Nucleotide Search
{{Bioinformatics
Bioinformatics organizations
Biology organisations based in the United Kingdom
Databases in the United Kingdom
Population genetics in the United Kingdom
South Cambridgeshire District