GenBank sequence database is an open access, annotated collection
of all publicly available nucleotide sequences and their protein
translations. This database is produced and maintained by the National
Center for Biotechnology Information (NCBI) as part of the
Nucleotide Sequence Database Collaboration (INSDC). The
National Center for Biotechnology Information
National Center for Biotechnology Information is a part of the
National Institutes of Health
National Institutes of Health in the United States.
GenBank and its collaborators receive sequences produced in
laboratories throughout the world from more than 100,000 distinct
organisms. The database started in 1982 by
Walter Goad and Los Alamos
GenBank has become an important database for
research in biological fields and has grown in recent years at an
exponential rate by doubling roughly every 18 months.
Release 194, produced in February 2013, contained over 150 billion
nucleotide bases in more than 162 million sequences.
built by direct submissions from individual laboratories, as well as
from bulk submissions from large-scale sequencing centers.
4 Incomplete identifications
5 See also
7 External links
Only original sequences can be submitted to GenBank. Direct
submissions are made to
GenBank using BankIt, which is a Web-based
form, or the stand-alone submission program, Sequin. Upon receipt of a
sequence submission, the
GenBank staff examines the originality of the
data and assigns an accession number to the sequence and performs
quality assurance checks. The submissions are then released to the
public database, where the entries are retrievable by
downloadable by FTP. Bulk submissions of
Expressed Sequence Tag (EST),
Sequence-tagged site (STS), Genome Survey Sequence (GSS), and
High-Throughput Genome Sequence (HTGS) data are most often submitted
by large-scale sequencing centers. The
GenBank direct submissions
group also processes complete microbial genome sequences.
Walter Goad of the Theoretical Biology and Biophysics Group at Los
Alamos National Laboratory and others established the Los Alamos
Sequence Database in 1979, which culminated in 1982 with the creation
of the public GenBank. Funding was provided by the National
Institutes of Health, the National Science Foundation, the Department
of Energy, and the Department of Defense.
LANL collaborated on GenBank
with the firm Bolt, Beranek, and Newman, and by the end of 1983 more
than 2,000 sequences were stored in it.
In the mid 1980s, the Intelligenetics bioinformatics company at
Stanford University managed the
GenBank project in collaboration with
LANL. As one of the earliest bioinformatics community projects on
the Internet, the
GenBank project started BIOSCI/Bionet news groups
for promoting open access communications among bioscientists. During
1989 to 1992, the
GenBank project transitioned to the newly created
National Center for Biotechnology Information.
Genbank and EMBL: NucleotideSequences 1986/1987 Volumes I to VII.
CDRom of Genbank v100
GenBank base pairs, 1982 to 2007, on a semi-log scale
GenBank release notes for release 162.0 (October 2007) state that
"from 1982 to the present, the number of bases in
GenBank has doubled
approximately every 18 months". As of
15 August 2017[update],
GenBank release 221.0 has
203,180,606 loci, 240,343,378,258 bases, from 203,180,606 reported
GenBank database includes additional data sets that are
constructed mechanically from the main sequence data collection, and
therefore are excluded from this count.
Top organisms in
GenBank (Release 191)
Oryza sativa Japonica Group
Xenopus (Silurana) tropicalis
Canis lupus familiaris
Public databases which may be searched using the National Center for
Biotechnology Information Basic Local Alignment Search Tool (NCBI
BLAST), lack peer-reviewed sequences of type strains and sequences of
non-type strains. On the other hand, while commercial databases
potentially contain high-quality filtered sequence data, there are a
limited number of reference sequences.
A paper released in the Journal of Clinical Microbiology evaluated
the 16S rRNA gene sequencing results analyzed with
conjunction with other freely available, quality-controlled, web-based
public databases, such as the EzTaxon-e
(http://eztaxon-e.ezbiocloud.net/) and the BIBI
(http://pbil.univ-lyon1.fr/bibi/) databases. The results showed that
analyses performed using
GenBank combined with EzTaxon-e (kappa =
0.79) were more discriminative than using
GenBank (kappa = 0.66) or
other databases alone.
Protein Reference Database (HPRD)
List of sequenced eukaryotic genomes
List of sequenced archaeal genomes
RefSeq — the Reference Sequence Database
Geneious — includes a
GenBank Submission Tool
^ The download page at UCSC says "NCBI places no restrictions on the
use or distribution of the
GenBank data. However, some submitters may
claim patent, copyright, or other intellectual property rights in all
or a portion of the data they have submitted. NCBI is not in a
position to assess the validity of such claims, and therefore cannot
provide comment or unrestricted permission concerning the use,
copying, or distribution of the information contained in GenBank."
^ Benson D; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Wheeler,
D. L.; et al. (2008). "GenBank". Nucleic Acids Research. 36
(Database): D25–D30. doi:10.1093/nar/gkm929. PMC 2238942 .
^ Benson D; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E.
W.; et al. (2009). "GenBank". Nucleic Acids Research. 37 (Database):
D26–D31. doi:10.1093/nar/gkn723. PMC 2686462 .
^ a b c "
GenBank release notes". NCBI.
^ Hanson, Todd (2000-11-21). "Walter Goad,
GenBank founder, dies".
Newsbulletin: obituary. Los Alamos National Laboratory.
^ Benton D (1990). "Recent changes in the
GenBank On-line Service".
Nucleic Acids Research. 18 (6): 1517–1520.
doi:10.1093/nar/18.6.1517. PMC 330520 .
^ Benson, D. A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.;
Lipman, D. J.; Ostell, J.; Sayers, E. W. (2012). "GenBank". Nucleic
Acids Research. 41 (Database issue): D36–D42.
doi:10.1093/nar/gks1195. PMC 3531190 .
^ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW
(January 2011). "GenBank". Nucleic Acids Res. 39 (Database issue):
D32–37. doi:10.1093/nar/gkq1079. PMC 3013681 .
^ Kyung Sun Parka, Chang-Seok Kia, Cheol-In Kangb, Yae-Jean Kimc, Doo
Ryeon Chungb, Kyong Ran Peckb, Jae-Hoon Songb and Nam Yong Lee (May
2012). "Evaluation of the GenBank, EzTaxon, and BIBI Services for
Molecular Identification of Clinical Blood Culture Isolates That Were
Unidentifiable or Misidentified by Conventional Methods". J. Clin.
Microbiol. 50 (5): 1792–1795. doi:10.1128/JCM.00081-12.
PMC 3347139 . PMID 22403421. CS1 maint: Uses authors
This article incorporates public domain material from the
National Center for Biotechnology Information
National Center for Biotechnology Information document "NCBI
Example sequence record, for hemoglobin beta
Sequin — a stand-alone software tool developed by the NCBI for
submitting and updating entries to the
GenBank sequence database.
EMBOSS — free, open source software for molecular biology
GenBank, RefSeq, TPA and UniProt: What's in a Name?
Sequence databases: GenBank, European
Nucleotide Archive and DNA Data
Bank of Japan
Secondary databases: UniProt, database of protein sequences grouping
Protein Information Resource
Protein Data Bank,
Ensembl and InterPro
Specialised genomic databases: BOLD, Saccharomyces Genome Database,
FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information
Resource and Zebrafish Information Network
Ontology: Gene Ontology
US National Center for Biotechnology Information
Swiss Institute of Bioinformatics
Japanese Institute of Genetics
Wellcome Sanger Institute
International Society for Computational Biology
International Society for Computational Biology (ISCB)
European Molecular Biology network (EMBnet)
African Society for
Bioinformatics and Computational Biology (ASBCB)
Japanese Society for
Intelligent Systems for Molecular Biology
Intelligent Systems for Molecular Biology (ISMB)
Research in Computational Molecular Biology
Research in Computational Molecular Biology (RECOMB)
European Conference on Computational Biology
European Conference on Computational Biology (ECCB)
Pacific Symposium on Biocomputing (PSB)
ISCB Africa ASBCB Conference on Bioinformatics
Basel Computational Biology Conference ([BC2])
International Conference on
List of biological databases