HOME

TheInfoList



OR:

Genome-based peptide fingerprint scanning (GFS) is a system in
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
analysis that attempts to identify the genomic origin (that is, what species they come from) of sample
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
s by scanning their
peptide-mass fingerprint In bio-informatics, a peptide-mass fingerprint or peptide-mass map is a mass spectrum of a mixture of peptides that comes from a digested protein being analyzed. The mass spectrum serves as a fingerprint in the sense that it is a pattern that can ...
against the theoretical translation and
proteolytic digest Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteases, ...
of an entire genome. This method is an improvement from previous methods because it compares the peptide fingerprints to an entire genome instead of comparing it to an already annotated genome. This improvement has the potential to improve genome annotation and identify proteins with incorrect or missing annotations.


History and background

GFS was designed by Michael C. Giddings (University of North Carolina, Chapel Hill) et al., and released in 2003. Giddings expanded the algorithms for GFS from earlier ideas. Two papers were published in 1993 explaining the techniques used to identify proteins in sequence databases. These methods determined the mass of peptides using
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a '' mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is u ...
, and then used the mass to search protein databases to identify the proteins In 1999 a more complex program was released called
Mascot A mascot is any human, animal, or object thought to bring luck, or anything used to represent a group with a common public identity, such as a school, professional sports team, society, military unit, or brand name. Mascots are also used as fic ...
that integrated three types of protein/database searches: peptide molecular weights,
tandem mass spectrometry Tandem mass spectrometry, also known as MS/MS or MS2, is a technique in instrumental analysis where two or more mass analyzers are coupled together using an additional reaction step to increase their abilities to analyse chemical samples. A comm ...
from one or more peptide, and combination mass data with amino acid sequence. The fallback with this widely used program is that it is unable to detect alternative splice sites that are not currently annotated, and it not usually able to find proteins that have not been annotated. Giddings built upon these sources to create GFS which would compare peptide mass data to entire genomes to identify the proteins. Giddings system is able to find new annotations of genes that have not been found, such as undocumented genes and undocumented alternative splice sites.


Research examples

In 2012 research was published where genes and proteins were found in a model organism that could not have been found without GFS because they had not been previously annotated. The planarian
Schmidtea mediterranea ''Schmidtea mediterranea'' is a freshwater triclad that lives in southern Europe and Tunisia. It is a model for regeneration, stem cells and development of tissues such as the brain and germline. Distribution ''Schmidtea mediterranea'' is found ...
has been used in research for over 100 years. This planarian is capable of regenerating missing body parts and is therefore emerging as potential model organism for stem cell research. Planarians are covered in mucus which aids in locomotion, in protecting them from predation, and in helping their immune system. The genome of ''Schmidtea mediterranea'' is sequenced but mostly un-annotated making it a prime candidate for genome-based peptide fingerprint scanning. When the proteins were analyzed with GFS 1,604 proteins were identified. These proteins had mostly not been annotated before they were found with GFS They were also able to find the mucous subproteome (all the genes associated with mucus production). They found that this proteome was conserved in the sister species ''Schmidtea mansoni''. The mucous subproteome is so conserved that 119
ortholog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spe ...
s of planarians are found in humans. Due to the similarity in these genes the planarian can now be used as a model to study mucous protein function in humans. This is relevant for infections and diseases related to mucous aberrancies such as
cystic fibrosis Cystic fibrosis (CF) is a rare genetic disorder that affects mostly the lungs, but also the pancreas, liver, kidneys, and intestine. Long-term issues include difficulty breathing and coughing up mucus as a result of frequent lung infections. ...
,
asthma Asthma is a long-term inflammatory disease of the airways of the lungs. It is characterized by variable and recurring symptoms, reversible airflow obstruction, and easily triggered bronchospasms. Symptoms include episodes of wheezing, c ...
, and other lung diseases. These genes could not have been found without GFS because they had not been previously annotated. In February 2013, proteogenomic mapping research was done with
ENCODE The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome. ENCODE also supports further biomedical research by "generating community resources of genomics data, software ...
to identify translational regions in the human genome. They applied peptide fingerprint scanning and MASCOT to the protein data to find regions that may not have been previously annotated as translated in the human genome. This search against the whole genome revealed that approximately 4% of unique peptide that they found were outside of previously annotated regions. Also the comparison of the whole genome revealed 15% more hits than from a protein database search (such as MASCOT) alone. GFS can be used as a complementary method for annotation due to the fact that you can find new genes or splice sites that have not been annotated before. However it is important to remember that the whole genome approach used by GFS can be less sensitive than programs that look only at annotated regions.


References


External links


Genome-based Peptide Fingerprint Scanning (GFS) DocumentationFacebook link to "Genome-based Peptide Fingerprint Scanning"
{{Use dmy dates, date=April 2017 Bioinformatics Genomics techniques