Protein I-sites

	Protein I-sites I-sites are short sequence-structure motifs that are mined from the Protein Data Bank (PDB) that correlate strongly with three-dimensional structural elements. These sequence-structure motifs are used for the local structure prediction of proteins. Local structure can be expressed as fragments or as backbone angles. Locations in the protein sequence that have high confidence I-sites predictions may be the initiation sites of folding. I-sites have also been identified as discrete models for folding pathways. I-sites consist of about 250 motifs. Each motif has an amino acid profile, a fragment structure (represented by a "paradigm" fragment chosen from a protein in the PDB) and optionally, a 4-dimensional tensor of pairwise sequence covariance. Construction of I-site Library The sequence and structure database The database initially consisted of 471 protein sequence families from the HSSP database, with an average of 47 aligned sequences per family. Each family contained a singl ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sequence Motif In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as ''Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue''. Overview When a sequence motif appears in the exon of a gene, it may encode the " structural motif" of a protein; that is a stereotypical element of the overall structure of the protein. Nevertheless, motifs need not be associated with a distinctive secondary structure. " Noncoding" sequences are not translated into proteins, and nucleic acids with such motifs need not deviate from the typical shape (e.g. the "B-form" DNA double helix). Outside of gene exons, there exist regulatory sequence motifs and motifs within the " junk", such as satellite DNA. Some of these are believed to affect the shape of nucleic acids (see for example ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained and deposited by biologists and biochemists worldwide through the use of experimental methodologies such as X-ray crystallography, Nuclear magnetic resonance spectroscopy of proteins, NMR spectroscopy, and, increasingly, cryo-electron microscopy. All submitted data are reviewed by expert Biocuration, biocurators and, once approved, are made freely available on the Internet under the CC0 Public Domain Dedication. Global access to the data is provided by the websites of the wwPDB member organizations (PDBe, PDBj, RCSB PDB, and BMRB). The PDB is a key in areas of structural biology, such as structural genomics. Most major scientific journals and some funding agencies now require scientists to submit their structure data to the PDB. Many other ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Protein Folding Protein folding is the physical process by which a protein, after Protein biosynthesis, synthesis by a ribosome as a linear chain of Amino acid, amino acids, changes from an unstable random coil into a more ordered protein tertiary structure, three-dimensional structure. This structure permits the protein to become biologically functional or active. The folding of many proteins begins even during the translation of the polypeptide chain. The amino acids interact with each other to produce a well-defined three-dimensional structure, known as the protein's native state. This structure is determined by the amino-acid sequence or primary structure. The correct three-dimensional structure is essential to function, although some parts of functional proteins Intrinsically unstructured proteins, may remain unfolded, indicating that protein dynamics are important. Failure to fold into a native structure generally produces inactive proteins, but in some instances, misfolded proteins have ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Similarity Measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. Though, in more broad terms, a similarity function may also satisfy metric axioms. Cosine similarity is a commonly used similarity measure for real-valued vectors, used in (among other fields) information retrieval to score the similarity of documents in the vector space model. In machine learning, common kernel functions such as the RBF kernel can be viewed as similarity functions. Use of different similarity measure formulas Different types of similarity measures exist for various types of objects, depending on the objects being compared. For each type of object there ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	K-means Algorithm ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. ''k''-means clustering minimizes within-cluster variances ( squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using ''k''-medians and ''k''-medoids. The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation–maximization algorithm for mixtures of Gaussian distributions via an i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]