HOME

TheInfoList



OR:

The Critical Assessment of Functional Annotation (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. Different
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
are evaluated by their ability to predict the
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
(GO) terms in the categories of Molecular Function,
Biological Process Biological processes are those processes that are vital for an organism to live, and that shape its capacities for interacting with its environment. Biological processes are made of many chemical reactions or other events that are involved in the ...
, and
Cellular Component Cellular components are the complex biomolecules and structures of which cells, and thus living organisms, are composed. Cells are the structural and functional units of life. The smallest organisms are single cells, while the largest organism ...
. The experiment consists of two tracks: (i) the
eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
track, (ii) the
prokaryotic A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Connec ...
track. In each track, a set of targets is provided by the organizers. Participants are expected to submit their predictions by the submission deadline, after which they are assessed according to a set of specific metrics.


Motivation

The
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
of an organism may consist of hundreds to tens of thousands of
genes In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
, which encode for hundreds of thousands of different
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
sequences. Due to the relatively low cost of
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...
, determining gene and protein sequences is fast and inexpensive. Thousands of species have been sequenced so far, yet many of the proteins are not well characterized. The process of experimentally determining the role of a protein in the cell, is an expensive and time consuming task. Further, even when functional
assays An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of a ...
are performed they are unlikely to provide complete insight into protein function. Therefore it has become important to use computational tools in order to functionally annotate proteins. There are several computational methods of protein function prediction that can infer protein function using a variety of biological and evolutionary data, but there is significant room for improvement. Accurate prediction of protein function can have longstanding implications on biomedical and pharmaceutical research. The CAFA experiment is designed to provide unbiased assessment of computational methods, to stimulate research in computational function prediction, and provide insights into the overall state-of-the-art in function prediction.


Organization

The experiment consists of three phases: # ''Prediction phase'': ~4 months Organizers provide protein sequences with unknown or incomplete function to community and set the deadline for the submission of predictions # ''Target accumulation'': 6–12 months After all predictions are stored and the experiment enters a waiting period in which protein functions are expected to accumulate in public databases # ''Analysis Phase'': 1 month Predictors are ranked according to their performance. The results are publicly shared in scientific meetings and published after
peer review Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work ( peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer revie ...
.


History

The CAFA experiment is conducted by the Automated Function Prediction (AFP) Special Interest Group (AFP/SIG). CAFA was conceived b
Dr. Inbal (Halperin) Landsberg
and was organized by her along with Prof. Russ Altman, and Dr. Iddo Friedberg. An AFP/SIG meeting has been held alongside the
Intelligent Systems for Molecular Biology Intelligent Systems for Molecular Biology (ISMB) is an annual academic conference on the subjects of bioinformatics and computational biology organised by the International Society for Computational Biology (ISCB). The principal focus of the con ...
conference in 2005, 2006, 2008, 2011, and 2012.


CAFA 2010-2012

The first CAFA experiment was organized between fall 2010 and spring 2012. The organizers provided 48,000 sequences for the community with the task to prediction Gene Ontology annotations for each of these sequences. Of those 48,000 proteins, 866 were experimentally annotated during target accumulation phase. The results showed that current function prediction algorithms perform significantly better than a simple domain assignment or a straightforward use of BLAST package. However, they also revealed that accurate prediction of a protein's biological function is still an open and challenging problem.


CAFA 2013-2014

The second CAFA experiment kicked off in fall 2013. Starting in August, interested parties could download more than 100,000 target sequences in 27 species. Registered teams are challenged to annotate the sequences with Gene Ontology terms, with an additional challenge to annotate human sequences with Human Phenotype Ontology terms. The submission deadline was January 15, 2014. The assessment of predictions will take place in June 2014.


See also

CASP:
Critical Assessment of protein Structure Prediction Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...

CAPRI Capri ( , ; ; ) is an island located in the Tyrrhenian Sea off the Sorrento Peninsula, on the south side of the Gulf of Naples in the Campania region of Italy. The main town of Capri that is located on the island shares the name. It has bee ...
: Critical Assessment of Prediction of Interactions


References

{{Reflist


External links


Automated Function Prediction Special Interest Group
- CAFA Challenge participation information Bioinformatics