HOME
The Info List - GenBank


--- Advertisement ---



The GenBank
GenBank
sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information (NCBI) as part of the International Nucleotide
Nucleotide
Sequence Database Collaboration (INSDC). The National Center for Biotechnology Information
National Center for Biotechnology Information
is a part of the National Institutes of Health
National Institutes of Health
in the United States. GenBank
GenBank
and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. The database started in 1982 by Walter Goad and Los Alamos National Laboratory. GenBank
GenBank
has become an important database for research in biological fields and has grown in recent years at an exponential rate by doubling roughly every 18 months.[2][3] Release 194, produced in February 2013, contained over 150 billion nucleotide bases in more than 162 million sequences.[4] GenBank
GenBank
is built by direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centers.

Contents

1 Submissions 2 History 3 Growth 4 Incomplete identifications 5 See also 6 References 7 External links

Submissions[edit] Only original sequences can be submitted to GenBank. Direct submissions are made to GenBank
GenBank
using BankIt, which is a Web-based form, or the stand-alone submission program, Sequin. Upon receipt of a sequence submission, the GenBank
GenBank
staff examines the originality of the data and assigns an accession number to the sequence and performs quality assurance checks. The submissions are then released to the public database, where the entries are retrievable by Entrez
Entrez
or downloadable by FTP. Bulk submissions of Expressed Sequence Tag (EST), Sequence-tagged site (STS), Genome Survey Sequence (GSS), and High-Throughput Genome Sequence (HTGS) data are most often submitted by large-scale sequencing centers. The GenBank
GenBank
direct submissions group also processes complete microbial genome sequences. History[edit] Walter Goad of the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory and others established the Los Alamos Sequence Database in 1979, which culminated in 1982 with the creation of the public GenBank.[5] Funding was provided by the National Institutes of Health, the National Science Foundation, the Department of Energy, and the Department of Defense. LANL
LANL
collaborated on GenBank with the firm Bolt, Beranek, and Newman, and by the end of 1983 more than 2,000 sequences were stored in it. In the mid 1980s, the Intelligenetics bioinformatics company at Stanford University
Stanford University
managed the GenBank
GenBank
project in collaboration with LANL.[6] As one of the earliest bioinformatics community projects on the Internet, the GenBank
GenBank
project started BIOSCI/Bionet news groups for promoting open access communications among bioscientists. During 1989 to 1992, the GenBank
GenBank
project transitioned to the newly created National Center for Biotechnology Information.[7]

Genbank and EMBL: NucleotideSequences 1986/1987 Volumes I to VII.

CDRom of Genbank v100

Growth[edit]

Growth in GenBank
GenBank
base pairs, 1982 to 2007, on a semi-log scale

The GenBank
GenBank
release notes for release 162.0 (October 2007) state that "from 1982 to the present, the number of bases in GenBank
GenBank
has doubled approximately every 18 months".[4][8] As of 15 August 2017[update], GenBank
GenBank
release 221.0 has 203,180,606 loci, 240,343,378,258 bases, from 203,180,606 reported sequences.[4] The GenBank
GenBank
database includes additional data sets that are constructed mechanically from the main sequence data collection, and therefore are excluded from this count.

Top organisms in GenBank
GenBank
(Release 191)[9]

Organism base pairs

Homo sapiens 7010163107741870000♠16,310,774,187

Mus musculus 7009997497788900000♠9,974,977,889

Rattus norvegicus 7009652125327200000♠6,521,253,272

Bos taurus 7009538625845500000♠5,386,258,455

Zea mays 7009506273105700000♠5,062,731,057

Sus scrofa 7009488786186000000♠4,887,861,860

Danio rerio 7009312085746200000♠3,120,857,462

Strongylocentrotus purpuratus 7009143523653400000♠1,435,236,534

Macaca mulatta 7009125620310100000♠1,256,203,101

Oryza sativa Japonica Group 7009125568657300000♠1,255,686,573

Nicotiana tabacum 7009119735781100000♠1,197,357,811

Xenopus (Silurana) tropicalis 7009124993861100000♠1,249,938,611

Drosophila melanogaster 7009111996522000000♠1,119,965,220

Pan troglodytes 7009100832329200000♠1,008,323,292

Arabidopsis thaliana 7009114422661600000♠1,144,226,616

Canis lupus familiaris 7008951238343000000♠951,238,343

Vitis vinifera 7008999010073000000♠999,010,073

Gallus gallus 7008899631338000000♠899,631,338

Glycine max 7008906638854000000♠906,638,854

Triticum aestivum 7008898689329000000♠898,689,329

Incomplete identifications[edit] Public databases which may be searched using the National Center for Biotechnology Information Basic Local Alignment Search Tool (NCBI BLAST), lack peer-reviewed sequences of type strains and sequences of non-type strains. On the other hand, while commercial databases potentially contain high-quality filtered sequence data, there are a limited number of reference sequences. A paper released in the Journal of Clinical Microbiology[10] evaluated the 16S rRNA gene sequencing results analyzed with GenBank
GenBank
in conjunction with other freely available, quality-controlled, web-based public databases, such as the EzTaxon-e (http://eztaxon-e.ezbiocloud.net/) and the BIBI (http://pbil.univ-lyon1.fr/bibi/) databases. The results showed that analyses performed using GenBank
GenBank
combined with EzTaxon-e (kappa = 0.79) were more discriminative than using GenBank
GenBank
(kappa = 0.66) or other databases alone. See also[edit]

Ensembl Human Protein
Protein
Reference Database (HPRD) Sequence analysis UniProt List of sequenced eukaryotic genomes List of sequenced archaeal genomes RefSeq — the Reference Sequence Database Geneious — includes a GenBank
GenBank
Submission Tool

References[edit]

^ The download page at UCSC says "NCBI places no restrictions on the use or distribution of the GenBank
GenBank
data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims, and therefore cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in GenBank." ^ Benson D; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Wheeler, D. L.; et al. (2008). "GenBank". Nucleic Acids Research. 36 (Database): D25–D30. doi:10.1093/nar/gkm929. PMC 2238942 . PMID 18073190.  ^ Benson D; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W.; et al. (2009). "GenBank". Nucleic Acids Research. 37 (Database): D26–D31. doi:10.1093/nar/gkn723. PMC 2686462 . PMID 18940867.  ^ a b c " GenBank
GenBank
release notes". NCBI.  ^ Hanson, Todd (2000-11-21). "Walter Goad, GenBank
GenBank
founder, dies". Newsbulletin: obituary. Los Alamos National Laboratory.  ^ LANL
LANL
GenBank
GenBank
History ^ Benton D (1990). "Recent changes in the GenBank
GenBank
On-line Service". Nucleic Acids Research. 18 (6): 1517–1520. doi:10.1093/nar/18.6.1517. PMC 330520 . PMID 2326192.  ^ Benson, D. A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W. (2012). "GenBank". Nucleic Acids Research. 41 (Database issue): D36–D42. doi:10.1093/nar/gks1195. PMC 3531190 . PMID 23193287.  ^ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (January 2011). "GenBank". Nucleic Acids Res. 39 (Database issue): D32–37. doi:10.1093/nar/gkq1079. PMC 3013681 . PMID 21071399.  ^ Kyung Sun Parka, Chang-Seok Kia, Cheol-In Kangb, Yae-Jean Kimc, Doo Ryeon Chungb, Kyong Ran Peckb, Jae-Hoon Songb and Nam Yong Lee (May 2012). "Evaluation of the GenBank, EzTaxon, and BIBI Services for Molecular Identification of Clinical Blood Culture Isolates That Were Unidentifiable or Misidentified by Conventional Methods". J. Clin. Microbiol. 50 (5): 1792–1795. doi:10.1128/JCM.00081-12. PMC 3347139 . PMID 22403421. CS1 maint: Uses authors parameter (link)

 This article incorporates public domain material from the National Center for Biotechnology Information
National Center for Biotechnology Information
document "NCBI Handbook".

External links[edit]

GenBank Example sequence record, for hemoglobin beta BankIt Sequin — a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank
GenBank
sequence database. EMBOSS — free, open source software for molecular biology GenBank, RefSeq, TPA and UniProt: What's in a Name?

v t e

Bioinformatics

Databases

Sequence databases: GenBank, European Nucleotide
Nucleotide
Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL
TrEMBL
and Protein
Protein
Information Resource Other databases: Protein
Protein
Data Bank, Ensembl
Ensembl
and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network

Software

BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat

Other

Server: ExPASy Ontology: Gene Ontology

Institutions

European Bioinformatics
Bioinformatics
Institute US National Center for Biotechnology Information Swiss Institute of Bioinformatics Japanese Institute of Genetics Broad Institute Wellcome Sanger Institute

Organizations

International Society for Computational Biology
International Society for Computational Biology
(ISCB) European Molecular Biology network (EMBnet) African Society for Bioinformatics
Bioinformatics
and Computational Biology (ASBCB) Japanese Society for Bioinformatics
Bioinformatics
(JSBi)

Meetings

Intelligent Systems for Molecular Biology
Intelligent Systems for Molecular Biology
(ISMB) Research in Computational Molecular Biology
Research in Computational Molecular Biology
(RECOMB) European Conference on Computational Biology
European Conference on Computational Biology
(ECCB) Pacific Symposium on Biocomputing (PSB) ISCB Africa ASBCB Conference on Bioinformatics Basel Computational Biology Conference‎ ([BC2]) International Conference on Bioinformatics
Bioinformatics
(InCoB)

Computational biology List of biological databases Sequencing Sequence database Sequence alignment Molec

.