UCLUST
   HOME





UCLUST
UCLUST is an algorithm designed to cluster nucleotide or amino-acid sequences into clusters based on sequence similarity. The algorithm was published in 2010 and implemented in a program also named UCLUST. The algorithm is described by the author as following two simple clustering criteria, in regard to the requested similarity threshold T. The first criterion states that any given cluster's centroid sequence will have a similarity smaller than T to any other clusters' centroid sequence. The second criterion states that each member sequence in a given cluster will have similarity to the cluster's centroid sequence that is equal or greater than T. UCLUST algorithm is a greedy one. As a result, the order of the sequences in the input file will affect the resulting clusters and their quality. For this reason, it is advised that the sequences will be sorted before entering clustering stage. The program UCLUST is equipped with some options to sort the input sequences prior to clustering ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sequence Clustering
In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related. The sequences can be either of genomic, " transcriptomic" ( ESTs) or protein origin. For proteins, homologous sequences are typically grouped into families. For EST data, clustering is important to group sequences originating from the same gene before the ESTs are assembled to reconstruct the original mRNA. Some clustering algorithms use single-linkage clustering, constructing a transitive closure of sequences with a similarity over a particular threshold. UCLUST and CD-HIT use a greedy algorithm that identifies a representative sequence for each cluster and assigns a new sequence to that cluster if it is sufficiently similar to the representative; if a sequence is not matched then it becomes the representative sequence for a new cluster. The similarity score is often based on sequence alignment. Sequence clustering is often used to make a non-redundant set of rep ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use Conditional (computer programming), conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a Heuristic (computer science), heuristic is an approach to solving problems without well-defined correct or optimal results.David A. Grossman, Ophir Frieder, ''Information Retrieval: Algorithms and Heuristics'', 2nd edition, 2004, For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an e ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all Life, life-forms on Earth. Nucleotides are obtained in the diet and are also synthesized from common Nutrient, nutrients by the liver. Nucleotides are composed of three subunit molecules: a nucleobase, a pentose, five-carbon sugar (ribose or deoxyribose), and a phosphate group consisting of one to three phosphates. The four nucleobases in DNA are guanine, adenine, cytosine, and thymine; in RNA, uracil is used in place of thymine. Nucleotides also play a central role in metabolism at a fundamental, cellular level. They provide chemical energy—in the form of the nucleoside triphosphates, adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP), and uridine triph ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Protein Primary Structure
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthesis is most commonly performed by ribosomes in cells. Peptides can also be synthesized in the laboratory. Protein primary structures can be directly sequenced, or inferred from DNA sequences. Formation Biological Amino acids are polymerised via peptide bonds to form a long backbone, with the different amino acid side chains protruding along it. In biological systems, proteins are produced during translation by a cell's ribosomes. Some organisms can also make short peptides by non-ribosomal peptide synthesis, which often use amino acids other than the encoded 22, and may be cyclised, modified and cross-linked. Chemical Peptides can be synthesised chemically via a range of laboratory methods. Chemical methods typically synthe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term ''computational biology'' refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for In silico, computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphis ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes. During gene expression (the synthesis of Gene product, RNA or protein from a gene), DNA is first transcription (biology), copied into RNA. RNA can be non-coding RNA, directly functional or be the intermediate protein biosynthesis, template for the synthesis of a protein. The transmission of genes to an organism's offspring, is the basis of the inheritance of phenotypic traits from one generation to the next. These genes make up different DNA sequences, together called a genotype, that is specific to every given individual, within the gene pool of the population (biology), population of a given species. The genotype, along with environmental and developmental factors, ultimately determines the phenotype ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Taxonomy (biology)
In biology, taxonomy () is the science, scientific study of naming, defining (Circumscription (taxonomy), circumscribing) and classifying groups of biological organisms based on shared characteristics. Organisms are grouped into taxon, taxa (singular: taxon), and these groups are given a taxonomic rank; groups of a given rank can be aggregated to form a more inclusive group of higher rank, thus creating a taxonomic hierarchy. The principal ranks in modern use are domain (biology), domain, kingdom (biology), kingdom, phylum (''division'' is sometimes used in botany in place of ''phylum''), class (biology), class, order (biology), order, family (biology), family, genus, and species. The Swedish botanist Carl Linnaeus is regarded as the founder of the current system of taxonomy, having developed a ranked system known as Linnaean taxonomy for categorizing organisms. With advances in the theory, data and analytical technology of biological systematics, the Linnaean system has transfo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Phylogenetic Analysis
In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical data and observed heritable traits of DNA sequences, protein amino acid sequences, and morphology. The results are a phylogenetic tree—a diagram depicting the hypothetical relationships among the organisms, reflecting their inferred evolutionary history. The tips of a phylogenetic tree represent the observed entities, which can be living taxa or fossils. A phylogenetic diagram can be rooted or unrooted. A rooted tree diagram indicates the hypothetical common ancestor of the taxa represented on the tree. An unrooted tree diagram (a network) makes no assumption about directionality of character state transformation, and does not show the origin or "root" of the taxa in question. In addition to their use for inferring phylogenetic patterns a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

2010 Software
1 (one, unit, unity) is a number, Numeral (linguistics), numeral, and glyph. It is the first and smallest Positive number, positive integer of the infinite sequence of natural numbers. This fundamental property has led to its unique uses in other fields, ranging from science to sports, where it commonly denotes the first, leading, or top thing in a group. 1 is the unit (measurement), unit of counting or measurement, a determiner for singular nouns, and a gender-neutral pronoun. Historically, the representation of 1 evolved from ancient Sumerian and Babylonian symbols to the modern Arabic numeral. In mathematics, 1 is the multiplicative identity, meaning that any number multiplied by 1 equals the same number. 1 is by convention not considered a prime number. In Digital electronics, digital technology, 1 represents the "on" state in binary code, the foundation of computing. Philosophically, 1 symbolizes the ultimate reality or source of existence in various traditions. In math ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bioinformatics Algorithms
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term ''computational biology'' refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for In silico, computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphis ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]