The Bioinformatic Harvester was a bioinformatic meta
search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
created by the
European Molecular Biology Laboratory
The European Molecular Biology Laboratory (EMBL) is an intergovernmental organization dedicated to molecular biology research and is supported by 29 member states, two prospect member states, and one associate member state. EMBL was created in ...
and subsequently hosted and further developed by KIT
Karlsruhe Institute of Technology
The Karlsruhe Institute of Technology (KIT; ) is both a German public research university in Karlsruhe, Baden-Württemberg, and a research center of the Helmholtz Association.
KIT was created in 2009 when the University of Karlsruhe (), founde ...
for
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s and protein-associated information. Harvester currently works for
human
Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
,
mouse
A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus'' ...
,
rat
Rats are various medium-sized, long-tailed rodents. Species of rats are found throughout the order Rodentia, but stereotypical rats are found in the genus ''Rattus''. Other rat genera include '' Neotoma'' (pack rats), '' Bandicota'' (bandicoo ...
,
zebrafish
The zebrafish (''Danio rerio'') is a species of freshwater ray-finned fish belonging to the family Danionidae of the order Cypriniformes. Native to South Asia, it is a popular aquarium fish, frequently sold under the trade name zebra danio (an ...
,
drosophila
''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
and
arabidopsis thaliana
''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.
How Harvester works
Harvester collects information from
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the
IPI and
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
protein information collection. The collections consists of:
* ~72.000 human, ~57.000 mouse, ~41.000 rat, ~51.000 zebrafish, ~35.000 arabidopsis protein pages, which cross-link ~50 major bioinformatic resources.
Harvester crosslinks several types of information
Text based information
From the following databases:
*
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
, one of the largest protein databases
*
SOURCE, convenient gene information overview
*
Simple Modular Architecture Research Tool (SMART)
*
SOSUI, predicts transmembrane domains
*
PSORT, predicts protein localisation
*
HomoloGene, compares proteins from different species
*
gfp-cdna, protein localisation with fluorescence microscopy
*
International Protein Index (IPI)
Databases rich in graphical elements
These databases are not collected, but are crosslinked, being displayed via
iframes. An iframe is a window within an
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
page for an embedded view of and interactive access to the linked database. Several such iframes are combined on a single Harvester protein page. This allows simultaneous, convenient comparison of information from several databases.
* NCBI-
BLAST, an algorithm for comparing biological sequences from the
NCBI
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is loca ...
*
Ensembl, automatic gene annotation by the EMBL-
EBI and
Sanger Institute
The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit organisation, non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.
It is l ...
*
FlyBase is a database of model organism ''
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
''
*
GoPubMed is a knowledge-based search engine for biomedical texts
*
iHOP, information hyperlinked over proteins via gene/protein synonyms
*
Mendelian Inheritance in Man project catalogues all the known diseases
*
RZPD, German resources Center for genome research in Berlin/Heidelberg
*
STRING
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
, Search Tool for the Retrieval of Interacting Genes/Proteins, developed by
EMBL,
SIB and
UZH
*
Zebrafish Information Network
LOCATEsubcellular localisation database (mouse)
Access from external application
*
Genome browser, working draft assemblies for genomes
UCSC
*
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...
*
Mitocheck
*
PolyMeta, meta search engine for Google, Yahoo, MSN, Ask, Exalead, AllTheWeb, GigaBlast
What one can find
Harvester allows a combination of different search terms and single words.
Search Examples:
* Gene-name: "golga3"
* Gene-alias: "ADAP-S ADAS ADHAPS ADPS" (one gene name is sufficient)
* Gene-Ontologies: "Enzyme linked receptor protein signaling pathway"
*
Unigene-Cluster: "Hs.449360"
* Go-annotation: "intra-Golgi transport"
* Molecular function: "protein kinase binding"
* Protein: "Q9NPD3"
* Protein domain: "SH2 sar"
* Protein Localisation: "endoplasmic reticulum"
* Chromosome: "2q31"
* Disease relevant: use the word "diseaselink"
* Combinations: "golgi diseaselink" (finds all golgi proteins associated with a disease)
*
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
: "AL136897"
* Word: "Cancer"
* Comment: "highly expressed in heart"
* Author: "Merkel, Schmidt"
* Publication or project: "
cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
sequencing project"
See also
*
List of academic databases and search engines
This page contains a representative list of major databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repository, institutional repositories, archives, or other collecti ...
*
Biological database
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including geno ...
s
*
Entrez
The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
*
European Bioinformatics Institute
*
Human Protein Reference Database
*
Metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
*
Sequence profiling tool
A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and ...
Literature
*
*
Notes and references
External links
* Bioinformatic Harvester V at KIT
Karlsruhe Institute of Technology
The Karlsruhe Institute of Technology (KIT; ) is both a German public research university in Karlsruhe, Baden-Württemberg, and a research center of the Helmholtz Association.
KIT was created in 2009 when the University of Karlsruhe (), founde ...
* {{Cite web , url=http://harvester42.fzk.de/ , title=Harvester42 at KIT - integrating 50 general search engines , access-date=2013-01-06 , archive-url=https://archive.today/20130106060017/http://harvester42.fzk.de/ , archive-date=2013-01-06 , url-status=dead
Bioinformatics software
Biological databases
Biology websites
Internet search engines
Science and technology in Cambridgeshire
South Cambridgeshire District