Pfam

picture info	Pfam Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The latest version of Pfam, 37.0, was released in June 2024 and contains 21,979 families. It is currently provided through InterPro website. Uses The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains. Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes. The Pfam classification of protein families has been widely adopted by biologists because of its wide coverage of proteins and sensible naming conventions. It is used by experimental biologists researching specific proteins, by structural biologists to identify new targets for structure determination, by computational biologists to organise sequences and by evolutionary biologis ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Pfam Logo Pfam is a database of Protein family, protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The latest version of Pfam, 37.0, was released in June 2024 and contains 21,979 families. It is currently provided through InterPro website. Uses The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains. Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes. The Pfam classification of protein families has been widely adopted by biologists because of its wide coverage of proteins and sensible naming conventions. It is used by experimental biologists researching specific proteins, by structural biologists to identify new targets for structure determination, by computational biologists to organise sequences and by evolution ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Stockholm Format Stockholm format is a multiple sequence alignment format used by Pfam, Rfam anDfam to disseminate protein, RNA and DNA sequence alignments. The alignment editorRalee,;()[aBb.-_--supports pseudoknot and further structure markup (see WUSS documentation) For protein [HGIEBTSCX] SA Surface Accessibility [0-9X] (0=0%-10%; ...; 9=90%-100%) TM TransMembrane [Mio] PP Posterior Probability [0-9] (0=0.00-0.05; 1=0.05-0.15; =0.95-1.00) LI LIgand binding AS Active Site pAS AS - Pfam predicted sAS AS - from SwissProt IN INtron (in or after) -2 For RNA tertiary interactions: ------------------------------ tWW WC/WC in trans For basepairs: AaBb...Zz">>AaBb...Zz For unpaired: cWH WC/Hoogsteen in cis cWS WC/SugarEdge in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Zinc Finger A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. The term ''zinc finger'' was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (''Xenopus laevis'') transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. '' Xenopus laevis'' TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in ''Drosophila''. It often appears as a metal-binding domain in multi-domain proteins. Proteins that contain zinc fingers (zinc finger proteins) are classified into several different structural families. Unlike many other clearly defined supersecondary structures such as Greek keys or β hairpins, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Domains Of Unknown Function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 2019, there are almost 4,000 DUF families within the Pfam database representing over 22% of known families. Some DUFs are not named using the nomenclature due to popular usage but are nevertheless DUFs. The DUF designation is tentative, and such families tend to be renamed to a more specific name (or merged to an existing domain) after a function is identified. History The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database. These two domains were found to be widely distributed in bacterial signaling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF domain and EAL domain respectively. Characterisation Structura ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	InterPro InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them. The contents of InterPro consist of diagnostic signatures and the proteins that they significantly match. The signatures consist of models (simple types, such as regular expressions or more complex ones, such as Hidden Markov models) which describe protein families, domains or sites. Unknown sequences are searched to create homology models. Each of the member databases of InterPro contributes towards a different niche, from very high-level, structure-based classifications ( SUPERFAMILY and CATH-Gene3D) through to quite specific sub-family classifications ( PRINTS and PANTHER). InterPro's intention is to provide a one-stop-shop for protein classification, where all the signatures produced by the different member databases are placed into entries within the Inte ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	List Of Biological Databases Biological databases are stores of biological information. The journal ''Nucleic Acids Research'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databasesOmics Discovery Indexcan be used to browse and search several biological databases. Furthermore, thNIAID Data Ecosystem Discovery Portaldeveloped by the National Institute of Allergy and Infectious Diseases (NIAID) enables searching across databases. Meta databases Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. Originally, metadata was only a common term referring simply to ''data about data '' such as tags, keywords, and markup headers. * ConsensusPathDB: a molecular funct ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sean Eddy Sean Roberts Eddy is a professor of molecular and cellular biology and of applied mathematics at Harvard University. Previously he was based at the Janelia Research Campus from 2006 to 2015 in Virginia. His research interests are in bioinformatics, computational biology and biological sequence analysis. projects include the use of Hidden Markov models in HMMER, Infernal Pfam and Rfam. Education Eddy graduated June, 1982 from Marion Center Area High School, Marion Center, Pennsylvania. He then completed a Bachelor of Science in Biology at California Institute of Technology in 1986, followed by a Doctor of Philosophy in molecular biology at the University of Colorado under the supervision of Larry Gold in 1991 studying the T4 phage. Career From 1992 to 1995 he was a postdoctoral research fellow at the Medical Research Council (MRC) Laboratory of Molecular Biology (LMB) in Cambridge UK working with John Sulston and Richard Durbin. From 1995 to 2007 he worked at Washington Univ ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	HMMER HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing a ''profile-HMM'' (a Hidden Markov model constructed explicitly for a particular search) to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the ''hmmbuild'' program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues. HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and macOS. HMMER is the core utility that protein family databases such as Pfam and ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Protein Family A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy. Proteins in a family descend from a common ancestor and typically have similar three-dimensional structures, functions, and significant sequence similarity. Sequence similarity (usually amino-acid sequence) is one of the most common indicators of homology, or common evolutionary ancestry. Some frameworks for evaluating the significance of similarity between sequences use sequence alignment methods. Proteins that do not share a common ancestor are unlikely to show statistically significant sequence similarity, making sequence alignment a powerful tool for identifying the members of protein families. Families are sometimes grouped together into larger clades called superfamilies based on st ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Rfam Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families. Unlike proteins, ncRNAs often have similar secondary structure without sharing much similarity in the primary sequence. Rfam divides ncRNAs into families based on evolution from a common ancestor. Producing multiple sequence alignments (MSA) of these families can provide insight into their structure and function, similar to the case of protein families. These MSAs become more useful with the addition of secondary structure information. Rfam researchers also contribute to Wikipedia's RNA WikiProject. Uses The Rfam database can be used for a variety of functions. For each ncRNA fa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	BLAST (biotechnology) In bioinformatics, BLAST (basic local alignment search tool) is an algorithm and program for comparing Primary structure, primary biological sequence information, such as the amino acid, amino-acid sequences of proteins or the nucleotides of DNA sequence, DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence (called a query) with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the Mus musculus, mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. Background BLAST is one of the most widely used bioinformatics programs for sequence searching. It addresses a fundamental problem in bioinformatics research ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hidden Markov Models A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X in a known way. Since X cannot be observed directly, the goal is to learn about state of X by observing Y. By definition of being a Markov model, an HMM has an additional requirement that the outcome of Y at time t = t_0 must be "influenced" exclusively by the outcome of X at t = t_0 and that the outcomes of X and Y at t < t_0 must be conditionally independent of $Y$ at $t=t_0$ given $X$ at time $t = t_0$ . Estimation of the parameters in an HMM can be performed using maximum likelihood estimation . For linear chain HMMs, the [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]