SEA-PHAGES
   HOME

TheInfoList



OR:

SEA-PHAGES stands for Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science; it was formerly called the National Genomics Research Initiative. This was the first initiative launched by the
Howard Hughes Medical Institute The Howard Hughes Medical Institute (HHMI) is an American non-profit medical research organization headquartered in Chevy Chase, Maryland with additional facilities in Ashburn, Virginia. It was founded in 1953 by Howard Hughes, an American busin ...
(HHMI) Science Education Alliance (SEA) by their director Tuajuanda C. Jordan in 2008 to improve the retention of
Science, technology, engineering, and mathematics Science, technology, engineering, and mathematics (STEM) is an umbrella term used to group together the distinct but related technical disciplines of science, technology, engineering, and mathematics. The term is typically used in the context of ...
(STEM) students. SEA-PHAGES is a two-semester undergraduate research program administered by the University of Pittsburgh's Graham Hatfull's group and the Howard Hughes Medical Institute's Science Education Division. Students from over 100 universities nationwide engage in authentic individual research that includes a wet-bench laboratory and a
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
component.


Curriculum

During the first semester of this program, classes of around 18-24 undergraduate students work under the supervision of one or two university faculty members and a graduate student assistant—who have completed two week-long training workshops—to isolate and characterize their own personal
bacteriophage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
that infects a specific bacterial host cell from local soil samples. Once students have successfully isolated a phage, they are able to classify them by visualizing them through Electron microscope (EM) images. Also,
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
is extracted and purified by the students, and one sample is sent for sequencing to be ready for the second semester's curriculum. The second semester consists of the annotation of the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
the class sent to be sequenced. In that case, students work together to evaluate the genes for start-stop coordinates, ribosome-binding sites, and possible functions of those
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s in which the sequence codes. Once the annotation is completed, it is submitted to the
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
's (NCBI) DNA sequence database
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
. If there is still time in the semester or the sent DNA was not able to be sequenced, the class could request genome file from the
University of Pittsburgh The University of Pittsburgh (Pitt) is a Commonwealth System of Higher Education, state-related research university in Pittsburgh, Pennsylvania, United States. The university is composed of seventeen undergraduate and graduate schools and colle ...
that had yet to be sequenced. In addition to the laboratory and bioinformatic skills acquired, students have the opportunity to publish their work in academic journals and attend the national SEA-PHAGES conference in
Washington, D.C. Washington, D.C., formally the District of Columbia and commonly known as Washington or D.C., is the capital city and federal district of the United States. The city is on the Potomac River, across from Virginia, and shares land borders with ...
or a regional symposium.


Online Databases and Bioinformatic Programs Used


PhagesDB

All of the details regarding each student's phage is made public by entering it into the online database PhagesDB to expand the knowledge of the SEA-PHAGES community as a whole.


Starterator

Starterator creates a report by comparing the called start sites of genes in the same Pham in annotated phage genomes and other drafts; therefore, students can suggest an appropriate start for the auto-annotated genes in their actinobacteriophage genome. This is not usually a primary source for calling a gene start because it is not always supported by the information from other programs or the start-stop coordinates are not the same for a gene called by DNA Master.


Local Blast and Blastp

These compare the amino acid sequence of a gene to other sequenced or annotated phage genomes within the database for students in the SEA-PHAGES community to predict starts and functions of their proteins.


GeneMark

This software generates a report with its algorithm that shows the coding potential for the six possible open reading frames of a specific genome, so the probability of a gene's existence can be assessed during annotation.


DNA Master

DNA Master is a free software tool that students can download on a Windows computer that utilizes the programs
GLIMMER In bioinformatics, GLIMMER (Gene Locator and Interpolated Markov ModelER) is used to gene prediction, find genes in prokaryotic DNA. "It is effective at finding genes in bacteria, archea, viruses, typically finding 98-99% of all relatively long ge ...
,
GeneMark GeneMark is a generic name for a family of ab initio gene prediction algorithms and software programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction ...
, Aragorn, and tRNAscan-SE to auto-annotate a genome that is uploaded as a
FASTA format In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The for ...
file. Since this is done by a computer algorithm that only uses three programs and may not be as updated as the online versions, each suggested gene has to be confirmed by student annotations. These go through several rounds of peer-review before it is accepted to be reviewed by experts from PhagesDB, then it can be submitted to
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
.


GLIMMER

These programs are used by DNA Master to predict the starts of the genes by assessing the probability of the six open reading frames (ORFs) and the ribosome binding site (RBS) signals. Oftentimes,
GLIMMER In bioinformatics, GLIMMER (Gene Locator and Interpolated Markov ModelER) is used to gene prediction, find genes in prokaryotic DNA. "It is effective at finding genes in bacteria, archea, viruses, typically finding 98-99% of all relatively long ge ...
and
GeneMark GeneMark is a generic name for a family of ab initio gene prediction algorithms and software programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction ...
agree on the predictions during the auto-annotation, but sometimes they give different starts which have to be assessed during manual annotation; GLIMMER is currently the most updated software and is usually used for the final start coordinate.


Aragorn

This algorithm is utilized by DNA Master, and there is an online version that can be used to cross-reference the calls made by the software. It shows definitive
tRNAs Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the genet ...
and tmRNAs within a genome by looking for very specific sequences that would fold into the distinctive cloverleaf secondary structure. Although this algorithm is considered very accurate considering how fast it produces results, it can miss some tRNAs that are not exactly within its search parameters.


tRNAscan-SE

This program allows students the ability to identify possible coding regions for tRNAs in sequence that would have been missed by Aragorn because it includes detection for unusual tRNA homologues; although, both programs have sensitivities between 99-100%. tRNAscan-SE does not detect tRNAs itself, but instead outputs the results of the information processed from three independent tRNA prediction programs: tRNAscan, EufindtRNA, and tRNA covariance model search.


Phamerator

Phamerator shows a visual representation of the genes and their similarity to other selected phage genomes by marking them with colored rectangles based on the Phamily or Pham it groups it in. Students can then view, compare, save, and print color-coded genome maps during their annotations. Possible insertions or deletions can be seen through connecting lines between the selected phage genomes. Also, the nucleotide and protein sequences can be accessed through this program; however, the starts and stops do not always match that of DNA Master so the sequences may be incorrect.


NCBI Blast and HHPred

These online programs are used to predict the functions of proteins by comparison of the amino acid or nucleotide sequences of all genomes sequenced, not just that of phages. HHPred detects homology in the sequences with other proteins that have had their functions called in any organism. Also, if the protein has been identified in another sequence, the computer-generated structure might be provided to visualize the possible folding of the amino acids.{{Cite web, url=https://toolkit.tuebingen.mpg.de/#/tools/hhpred, title=Bioinformatics Toolkit, website=toolkit.tuebingen.mpg.de, access-date=2018-04-10


References

Science education in the United States Biology education