Rosetta@home is a
volunteer computing
Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
project researching
protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is differen ...
on the
Berkeley Open Infrastructure for Network Computing
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
(BOINC) platform, run by the
Baker laboratory at the
University of Washington
The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington.
Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
. Rosetta@home aims to predict
protein–protein docking
Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules. Protein–protein complexes are the most commonly attempted targets of such modelling, fol ...
and
design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 Giga
FLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
on average as of September 19, 2020.
Foldit
Foldit is an online puzzle video game about protein folding. It is part of an experimental research project developed by the University of Washington, Center for Game Science, in collaboration with the UW Department of Biochemistry. The objective ...
, a Rosetta@home videogame, aims to reach these goals with a
crowdsourcing
Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
approach. Though much of the project is oriented toward
basic research
Basic research, also called pure research or fundamental research, is a type of scientific research with the aim of improving scientific theory, theories for better understanding and prediction of natural or other phenomena. In contrast, applied ...
to improve the accuracy and robustness of
proteomics
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
methods, Rosetta@home also does
applied research
Applied science is the use of the scientific method and knowledge obtained via conclusions from the method to attain practical goals. It includes a broad range of disciplines such as engineering and medicine. Applied science is often contrasted ...
on
malaria
Malaria is a mosquito-borne infectious disease that affects humans and other animals. Malaria causes symptoms that typically include fever, tiredness, vomiting, and headaches. In severe cases, it can cause jaundice, seizures, coma, or deat ...
,
Alzheimer's disease, and other pathologies.
Like all BOINC projects, Rosetta@home uses idle computer processing resources from volunteers' computers to perform calculations on individual
workunits. Completed results are sent to a central project
server where they are validated and assimilated into project
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
s. The project is
cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
, and runs on a wide variety of hardware configurations. Users can view the progress of their individual
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
structure prediction on the Rosetta@home screensaver.
In addition to disease-related research, the Rosetta@home network serves as a testing framework for new methods in
structural bioinformatics. Such methods are then used in other Rosetta-based applications, like
RosettaDock or the
Human Proteome Folding Project
The Human Proteome Folding Project (HPF) is a collaborative effort between New York University ( Bonneau Lab), the Institute for Systems Biology (ISB) and the University of Washington (Baker
A baker is a tradesperson who bakes and sometime ...
and the
Microbiome Immunity Project, after being sufficiently developed and proven stable on Rosetta@home's large and diverse set of volunteer computers. Two especially important tests for the new methods developed in Rosetta@home are the
Critical Assessment of Techniques for Protein Structure Prediction (CASP) and
Critical Assessment of Prediction of Interactions (CAPRI) experiments, biennial experiments which evaluate the state of the art in protein structure prediction and protein–protein docking prediction, respectively. Rosetta@home consistently ranks among the foremost docking predictors, and is one of the best
tertiary structure
Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may int ...
predictors available.
With an influx of new users looking to participate in the fight against the
COVID-19 pandemic
The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified ...
, caused by
SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
, Rosetta@home has increased its computing power up to 1.7 PetaFlops as of March 28, 2020. On September 9, 2020, Rosetta@home researchers published a paper describing 10 potent antiviral candidates against SARS-CoV-2. Rosetta@home contributed to this research and these antiviral candidates are heading towards Phase 1 clinical trials, which may begin in early 2022.
According to the Rosetta@home team, Rosetta volunteers contributed to the development of a nanoparticle vaccine.
This vaccine has been licensed and is known as the
IVX-411 by Icosavax, which began a Phase I/II clinical trial in June 2021,
and
GBP510 which is being developed by SK Bioscience and is already approved for a Phase III clinical trial in South Korea.
NL-201, a cancer drug candidate that was first created at the Institute of Protein Design (IPD) and published in a January 2019 paper,
began a Phase 1 Human clinical trial in May 2021 with the support of Neoleukin Therapeutics, itself a spin-off from the IPD.
Rosetta@home played a role in the development of NL-201 and contributed with "forward folding" experiments that helped validate protein designs.
Computing platform
The Rosetta@home application and the
BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
volunteer computing platform are available for the operating systems
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
,
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
, and
macOS
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
; BOINC also runs on several others, e.g., FreeBSD.
Participation in Rosetta@home requires a
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU) with a
clock speed
In computing, the clock rate or clock speed typically refers to the frequency at which the clock generator of a processor can generate pulses, which are used to synchronize the operations of its components, and is used as an indicator of th ...
of at least 500
MHz, 200
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
s of free
disk space, 512 megabytes of
physical memory
Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processing unit (CPU) of a comput ...
, and Internet connectivity. As of July 20, 2016, the current version of the Rosetta Mini application is 3.73.
The current recommended BOINC program version is 7.6.22.
Standard
Hypertext Transfer Protocol
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
(HTTP) (
port
A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as ...
80) is used for communication between the user's BOINC client and the Rosetta@home servers at the University of Washington;
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is e ...
(port 443) is used during password exchange. Remote and local control of the BOINC client use port 31416 and port 1043, which might need to be specifically unblocked if they are behind a
firewall
Firewall may refer to:
* Firewall (computing), a technological barrier designed to prevent unauthorized or unwanted communications between computer networks or hosts
* Firewall (construction), a barrier inside a building, designed to limit the spre ...
.
Workunits containing data on individual proteins are distributed from servers located in the Baker lab at the
University of Washington
The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington.
Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
to volunteers' computers, which then calculate a structure prediction for the assigned protein. To avoid duplicate structure predictions on a given protein, each workunit is initialized with a
random seed
A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.
For a seed to be used in a pseudorandom number generator, it does not need to be random. Because of the nature of number ge ...
number. This gives each prediction a unique trajectory of descent along the protein's
energy landscape. Protein structure predictions from Rosetta@home are approximations of a
global minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
in a given protein's energy landscape. That global minimum represents the most energetically favorable conformation of the protein, i.e., its
native state
In biochemistry, the native state of a protein or nucleic acid is its properly folded and/or assembled form, which is operative and functional. The native state of a biomolecule may possess all four levels of biomolecular structure, with the s ...
.
A primary feature of the Rosetta@home
graphical user interface
The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows User (computing), users to Human–computer interaction, interact with electronic devices through graphical icon (comp ...
(GUI) is a
screensaver
A screensaver (or screen saver) is a computer program that blanks the display screen or fills it with moving images or patterns when the computer has been idle for a designated time. The original purpose of screensavers was to prevent phosphor ...
which shows a current
workunit's progress during the simulated
protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reprodu ...
process. In the upper-left of the current screensaver, the target protein is shown adopting different shapes (conformations) in its search for the lowest energy structure. Depicted immediately to the right is the structure of the most recently accepted. On the upper right the lowest energy conformation of the current decoy is shown; below that is the true, or native, structure of the protein if it has already been determined. Three graphs are included in the screensaver. Near the middle, a graph for the accepted model's
thermodynamic free energy
The thermodynamic free energy is a concept useful in the thermodynamics of chemical or thermal processes in engineering and science. The change in the free energy is the maximum amount of work that a thermodynamic system can perform in a process ...
is displayed, which fluctuates as the accepted model changes. A graph of the accepted model's
root-mean-square deviation (RMSD), which measures how structurally similar the accepted model is to the native model, is shown far right. On the right of the accepted energy graph and below the RMSD graph, the results from these two functions are used to produce an energy vs. RMSD plot as the model is progressively refined.
Like all BOINC projects, Rosetta@home runs in the background of the user's computer, using idle computer power, either at or before logging into an account on the host
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
. The program frees resources from the CPU as they are needed by other applications so that normal computer use is unaffected. Many program settings can be specified via user account preferences, including: the maximum percentage of CPU resources the program can use (to control power consumption or heat production from a computer running at sustained capacity), the times of day during which the program can run, and many more.
Rosetta, the software that runs on the Rosetta@home network, was rewritten in
C++ to allow easier development than that allowed by its original version, which was written in
Fortran. This new version is
object-oriented
Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of ...
, and was released on February 8, 2008.
Development of the Rosetta code is done by Rosetta Commons.
The software is freely licensed to the academic community and available to pharmaceutical companies for a fee.
Project significance
With the proliferation of
genome sequencing projects, scientists can infer the amino acid sequence, or
primary structure
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...
, of many proteins that carry out functions within the cell. To better understand a protein's function and aid in
rational drug design, scientists need to know the protein's three-dimensional
tertiary structure
Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may int ...
.

Protein 3D structures are currently determined experimentally via
X-ray crystallography
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angle ...
or
nuclear magnetic resonance
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
(NMR) spectroscopy. The process is slow (it can take weeks or even months to figure out how to crystallize a protein for the first time) and costly (around US$100,000 per protein). Unfortunately, the rate at which new sequences are discovered far exceeds the rate of structure determination – out of more than 7,400,000 protein sequences available in the
National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. T ...
(NCBI) nonredundant (nr) protein database, fewer than 52,000 proteins' 3D structures have been solved and deposited in the
Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, c ...
, the main repository for structural information on proteins. One of the main goals of Rosetta@home is to predict protein structures with the same accuracy as existing methods, but in a way that requires significantly less time and money. Rosetta@home also develops methods to determine the structure and docking of
membrane protein
Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane ...
s (e.g.,
G protein–coupled receptor
G protein-coupled receptors (GPCRs), also known as seven-(pass)-transmembrane domain receptors, 7TM receptors, heptahelical receptors, serpentine receptors, and G protein-linked receptors (GPLR), form a large group of evolutionarily-related p ...
s (GPCRs)), which are exceptionally difficult to analyze with traditional techniques like X-ray crystallography and NMR spectroscopy, yet represent the majority of targets for modern drugs.
Progress in protein structure prediction is evaluated in the biannual
Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, in which researchers from around the world attempt to derive a protein's structure from the protein's amino acid sequence. High scoring groups in this sometimes competitive experiment are considered the ''de facto'' standard-bearers for what is the state of the art in protein structure prediction. Rosetta, the program on which Rosetta@home is based, has been used since CASP5 in 2002. In the 2004 CASP6 experiment, Rosetta made history by being the first to produce a close to atomic-level resolution, ''ab initio
protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is differen ...
'' in its submitted model for CASP target T0281.
''Ab initio'' modeling is considered an especially difficult category of protein structure prediction, as it does not use information from
structural homology and must rely on information from
sequence homology
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a sp ...
and modeling physical interactions within the protein. Rosetta@home has been used in CASP since 2006, where it was among the top predictors in every category of structure prediction in CASP7.
These high quality predictions were enabled by the computing power made available by Rosetta@home volunteers.
Increasing computing power allows Rosetta@home to sample more regions of
conformation space (the possible shapes a protein can assume), which, according to
Levinthal's paradox
Levinthal's paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the mol ...
, is predicted to
increase exponentially with protein length.
Rosetta@home is also used in
protein–protein docking
Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules. Protein–protein complexes are the most commonly attempted targets of such modelling, fol ...
prediction, which determines the structure of multiple
complexed proteins, or
quaternary structure
Protein quaternary structure is the fourth (and highest) classification level of protein structure. Protein quaternary structure refers to the structure of proteins which are themselves composed of two or more smaller protein chains (also refe ...
. This type of
protein interaction affects many cellular functions, including antigen–antibody and enzyme–inhibitor binding and cellular import and export. Determining these interactions is critical for
drug design
Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that ac ...
. Rosetta is used in the
Critical Assessment of Prediction of Interactions (CAPRI) experiment, which evaluates the state of the protein docking field similar to how CASP gauges progress in protein structure prediction. The computing power made available by Rosetta@home's project volunteers has been cited as a major factor in Rosetta's performance in CAPRI, where its docking predictions have been among the most accurate and complete.
In early 2008, Rosetta was used to computationally design a protein with a function never before observed in nature.
This was inspired in part by the retraction of a high-profile paper from 2004 which originally described the computational design of a protein with improved enzymatic activity relative to its natural form. The 2008
research paper from David Baker's group describing how the protein was made, which cited Rosetta@home for the computing resources it made available, represented an important
proof of concept
Proof of concept (POC or PoC), also known as proof of principle, is a realization of a certain method or idea in order to demonstrate its feasibility, or a demonstration in principle with the aim of verifying that some concept or theory has prac ...
for this protein design method.
This type of protein design could have future applications in drug discovery,
green chemistry
Green chemistry, also called sustainable chemistry, is an area of chemistry and chemical engineering focused on the design of products and processes that minimize or eliminate the use and generation of hazardous substances. While environmental ch ...
, and
bioremediation
Bioremediation broadly refers to any process wherein a biological system (typically bacteria, microalgae, fungi, and plants), living or dead, is employed for removing environmental pollutants from air, water, soil, flue gasses, industrial effluent ...
.
Disease-related research
In addition to basic research in predicting protein structure, docking and design, Rosetta@home is also used in immediate disease-related research.
Numerous minor research projects are described in David Baker's Rosetta@home journal. As of February 2014, information on recent publications and a short description of the work are being updated on the forum. The forum thread is no longer used since 2016, and news on the research can be found on the general news section of the project.
Alzheimer's disease
A component of the Rosetta software suite, RosettaDesign, was used to accurately predict which regions of amyloidogenic proteins were most likely to make
amyloid
Amyloids are aggregates of proteins characterised by a fibrillar morphology of 7–13 nm in diameter, a beta sheet (β-sheet) secondary structure (known as cross-β) and ability to be stained by particular dyes, such as Congo red. In the hu ...
-like fibrils. By taking hexapeptides (six amino acid-long fragments) of a protein of interest and selecting the lowest energy match to a structure similar to that of a known fibril forming hexapeptide, RosettaDesign was able to identify peptides twice as likely to form fibrils as are random proteins. Rosetta@home was used in the same study to predict structures for
amyloid beta
Amyloid beta (Aβ or Abeta) denotes peptides of 36–43 amino acids that are the main component of the amyloid plaques found in the brains of people with Alzheimer's disease. The peptides derive from the amyloid precursor protein (APP), which i ...
, a fibril-forming protein that has been postulated to cause
Alzheimer's disease. Preliminary but as yet unpublished results have been produced on Rosetta-designed proteins that may prevent fibrils from forming, although it is unknown whether it can prevent the disease.
Anthrax
Another component of Rosetta, RosettaDock,
was used in conjunction with experimental methods to model interactions between three proteins—lethal factor (LF), edema factor (EF) and protective antigen (PA)—that make up
anthrax toxin
Anthrax toxin is a three-protein exotoxin secreted by virulent strains of the bacterium, ''Bacillus anthracis''—the causative agent of anthrax. The toxin was first discovered by Harry Smith in 1954. Anthrax toxin is composed of a cell-bindin ...
. The computer model accurately predicted docking between LF and PA, helping to establish which
domains of the respective proteins are involved in the LF–PA complex. This insight was eventually used in research resulting in improved anthrax vaccines.
Herpes simplex virus 1
RosettaDock was used to model docking between an
antibody
An antibody (Ab), also known as an immunoglobulin (Ig), is a large, Y-shaped protein used by the immune system to identify and neutralize foreign objects such as pathogenic bacteria and Viral disease, viruses. The antibody recognizes a unique m ...
(
immunoglobulin G
Immunoglobulin G (Ig G) is a type of antibody. Representing approximately 75% of serum antibodies in humans, IgG is the most common type of antibody found in blood circulation. IgG molecules are created and released by plasma B cells. Each IgG ...
) and a surface protein expressed by the cold sore virus,
herpes simplex virus 1 (HSV-1) which serves to degrade the antiviral antibody. The protein complex predicted by RosettaDock closely agreed with the especially difficult-to-obtain experimental models, leading researchers to conclude that the docking method has potential to address some of the problems that X-ray crystallography has with modelling protein–protein interfaces.
HIV
As part of research funded by a $19.4 million grant by the
Bill & Melinda Gates Foundation
The Bill & Melinda Gates Foundation (BMGF), a merging of the William H. Gates Foundation and the Gates Learning Foundation, is an American private foundation founded by Bill Gates and Melinda French Gates. Based in Seattle, Washington, it w ...
, Rosetta@home has been used in designing multiple possible vaccines for human immunodeficiency virus (
HIV
The human immunodeficiency viruses (HIV) are two species of '' Lentivirus'' (a subgroup of retrovirus) that infect humans. Over time, they cause acquired immunodeficiency syndrome (AIDS), a condition in which progressive failure of the immu ...
).
Malaria
In research involved with the
Grand Challenges in Global Health initiative, Rosetta has been used to computationally design novel
homing endonuclease
The homing endonucleases are a collection of endonucleases encoded either as freestanding genes within introns, as fusions with host proteins, or as self-splicing inteins. They catalyze the hydrolysis of genomic DNA within the cells that synthes ...
proteins, which could eradicate ''
Anopheles gambiae
The ''Anopheles gambiae'' complex consists of at least seven morphologically indistinguishable species of mosquitoes in the genus ''Anopheles''. The complex was recognised in the 1960s and includes the most important vectors of malaria in s ...
'' or otherwise render the mosquito unable to transmit
malaria
Malaria is a mosquito-borne infectious disease that affects humans and other animals. Malaria causes symptoms that typically include fever, tiredness, vomiting, and headaches. In severe cases, it can cause jaundice, seizures, coma, or deat ...
. Being able to model and alter protein–DNA interactions specifically, like those of homing endonucleases, gives computational protein design methods like Rosetta an important role in
gene therapy
Gene therapy is a medical field which focuses on the genetic modification of cells to produce a therapeutic effect or the treatment of disease by repairing or reconstructing defective genetic material. The first attempt at modifying human D ...
(which includes possible cancer treatments).
COVID-19
Rosetta molecular modelling suite was recently used to accurately predict the atomic-scale structure of the
SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
spike protein weeks before it could be measured in the lab. On June 26 of 2020, the project announced it had succeeded in creating antiviral proteins that neutralize SARS-CoV-2 virions in the lab and that these experimental antiviral drugs are being optimized for animal testing trials.
In a follow-up, a paper describing 10 SARS-CoV-2 miniprotein inhibitors was published in ''
Science
Science is a systematic endeavor that Scientific method, builds and organizes knowledge in the form of Testability, testable explanations and predictions about the universe.
Science may be as old as the human species, and some of the earli ...
'' on September 9. Two of these inhibitors, LCB1 and LCB3, are several times more potent than the best
monoclonal antibodies
A monoclonal antibody (mAb, more rarely called moAb) is an antibody produced from a cell Lineage made by cloning a unique white blood cell. All subsequent antibodies derived this way trace back to a unique parent cell.
Monoclonal antibodies ...
being developed against SARS-CoV-2, both on a molar and mass basis. In addition, the research suggests that these inhibitors retain their activity at elevated temperatures, are 20-fold smaller than an antibody and thus, have 20-fold more potential neutralizing sites, increasing the potential efficacy of a locally administered drug. The small size and high stability of the inhibitors is expected to make them adequate to a gel formulation that can be nasally applied or as a powder to be administered directly onto the respiratory system. The researchers will work on developing these inhibitors into therapeutics and prophylactics in the months ahead.
As of July 2021, these antiviral candidates were forecasted to begin clinical trials in early 2022 and had received funding from the
Bill & Melinda Gates Foundation
The Bill & Melinda Gates Foundation (BMGF), a merging of the William H. Gates Foundation and the Gates Learning Foundation, is an American private foundation founded by Bill Gates and Melinda French Gates. Based in Seattle, Washington, it w ...
for preclinical and early clinical trials.
In animal testing trials, these antiviral candidates were effective against variants of concern including Alpha, Beta and Gamma.
Rosetta@home was used to help screen the over 2 million SARS-CoV-2 Spike-binding proteins that were computationally designed, and thus, contributed to this research.
Per the Rosetta@home team at the Institute of Protein Design, Rosetta@home volunteers contributed to the development of antiviral drug candidates
and to a protein nanoparticle vaccine. The IVX-411 vaccine is already on a Phase 1 clinical trial run by
Icosavax while the same vaccine, licensed to another manufacturer and under the name GBP510, has been approved in South Korea for a Phase III trial run by
SK Bioscience.
The candidate antivirals are also going towards Phase 1 clinical trials.
Cancer
Rosetta@home researchers have designed an
IL-2 receptor
The interleukin-2 receptor (IL-2R) is a heterotrimeric protein expressed on the surface of certain immune cells, such as lymphocytes, that binds and responds to a cytokine called IL-2.
Composition
IL-2 binds to the IL-2 receptor, which ...
agonist called Neoleukin-2/15 that does not interact with the alpha subunit of the receptor. Such immunity signal molecules are useful in cancer treatment. While the natural IL-2 suffers from toxicity due to an interaction with the alpha subunit, the designed protein is much safer, at least in animal models.
Rosetta@home contributed in "forward folding experiments" which helped validate designs.
In a September 2020 feature in the ''
New Yorker
New Yorker or ''variant'' primarily refers to:
* A resident of the State of New York
** Demographics of New York (state)
* A resident of New York City
** List of people from New York City
* ''The New Yorker'', a magazine founded in 1925
* '' The ...
'', David Baker stated that Neoleukin-2/15 would begin human clinical trials "later this year". Neoleukin-2/15 is being developed by
Neoleukin, a spin-off company from the Baker lab.
In December 2020, Neoleukin announced it would be submitting an
Investigational New Drug application with the
Food and Drug Administration
The United States Food and Drug Administration (FDA or US FDA) is a federal agency of the Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the control and supervision of food ...
in order to begin a Phase 1 clinical trial of NL-201, which is a further development of Neoleukin-2/15. A similar application was submitted in Australia and Neoleukin hopes to enrol up 120 participants on the Phase 1 clinical trial. The Phase 1 human clinical trial began on May 5, 2021.
Development history and branches
Originally introduced by the Baker laboratory in 1998 as an ''ab initio'' approach to structure prediction, Rosetta has since branched into several development streams and distinct services. The Rosetta platform derives its name from the
Rosetta Stone
The Rosetta Stone is a stele composed of granodiorite inscribed with three versions of a decree issued in Memphis, Egypt, in 196 BC during the Ptolemaic dynasty on behalf of King Ptolemy V Epiphanes. The top and middle texts are in Ancient ...
, as it attempts to decipher the structural "meaning" of proteins' amino acid sequences. More than seven years after Rosetta's first appearance, the Rosetta@home project was released (i.e., announced as no longer
beta
Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, vÃta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labi ...
) on October 6, 2005.
Many of the graduate students and other researchers involved in Rosetta's initial development have since moved to other universities and research institutions, and subsequently enhanced different parts of the Rosetta project.
RosettaDesign

RosettaDesign, a computing approach to protein design based on Rosetta, began in 2000 with a study in redesigning the folding pathway of
Protein G. In 2002 RosettaDesign was used to design
Top7, a 93-amino acid long
α/β protein that had an overall
fold never before recorded in nature. This new conformation was predicted by Rosetta to within 1.2
Ã… RMSD
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents ...
of the structure determined by
X-ray crystallography
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angle ...
, representing an unusually accurate structure prediction. Rosetta and RosettaDesign earned widespread recognition by being the first to design and accurately predict the structure of a novel protein of such length, as reflected by the 2002 paper describing the dual approach prompting two positive letters in the journal ''
Science
Science is a systematic endeavor that Scientific method, builds and organizes knowledge in the form of Testability, testable explanations and predictions about the universe.
Science may be as old as the human species, and some of the earli ...
'', and being cited by more than 240 other scientific articles. The visible product of that research,
Top7, was featured as the RCSB PDB's 'Molecule of the Month' in October 2006; a
superposition of the respective cores (residues 60–79) of its predicted and X-ray crystal structures are featured in the Rosetta@home logo.
Brian Kuhlman, a former postdoctoral associate in
David Baker's lab and now an associate professor at the
University of North Carolina, Chapel Hill
A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase ''universitas magistrorum et scholarium'', which ...
, offers RosettaDesign as an online service.
RosettaDock
RosettaDock was added to the Rosetta software suite during the first
CAPRI
Capri ( , ; ; ) is an island located in the Tyrrhenian Sea off the Sorrento Peninsula, on the south side of the Gulf of Naples in the Campania region of Italy. The main town of Capri that is located on the island shares the name. It has bee ...
experiment in 2002 as the Baker laboratory's
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for
protein–protein docking
Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules. Protein–protein complexes are the most commonly attempted targets of such modelling, fol ...
prediction.
In that experiment, RosettaDock made a high-accuracy prediction for the docking between
streptococcal pyogenic exotoxin A and a
T cell-receptor β-chain, and a medium accuracy prediction for a complex between
porcine
The pig (''Sus domesticus''), often called swine, hog, or domestic pig when distinguishing from other members of the genus ''Sus'', is an omnivorous, domesticated, even-toed, hoofed mammal. It is variously considered a subspecies of ''Sus ...
α-amylase and a
camelid
Camelids are members of the biological family Camelidae, the only currently living family in the suborder Tylopoda. The seven extant members of this group are: dromedary camels, Bactrian camels, wild Bactrian camels, llamas, alpacas, vicuñas, ...
antibody
An antibody (Ab), also known as an immunoglobulin (Ig), is a large, Y-shaped protein used by the immune system to identify and neutralize foreign objects such as pathogenic bacteria and Viral disease, viruses. The antibody recognizes a unique m ...
. While the RosettaDock method only made two acceptably accurate predictions out of seven possible, this was enough to rank it seventh out of nineteen prediction methods in the first CAPRI assessment.
Development of RosettaDock diverged into two branches for subsequent CAPRI rounds as Jeffrey Gray, who laid the groundwork for RosettaDock while at the
University of Washington
The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington.
Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
, continued working on the method in his new position at
Johns Hopkins University
Johns Hopkins University (Johns Hopkins, Hopkins, or JHU) is a private research university in Baltimore, Maryland. Founded in 1876, Johns Hopkins is the oldest research university in the United States and in the western hemisphere. It consiste ...
. Members of the Baker laboratory further developed RosettaDock in Gray's absence. The two versions differed slightly in side-chain modeling, decoy selection and other areas.
Despite these differences, both the Baker and Gray methods performed well in the second CAPRI assessment, placing fifth and seventh respectively out of 30 predictor groups. Jeffrey Gray's RosettaDock server is available as a free docking prediction service for non-commercial use.
In October 2006, RosettaDock was integrated into Rosetta@home. The method used a fast, crude docking model phase using only the
protein backbone. This was followed by a slow full-atom refinement phase in which the orientation of the two interacting proteins relative to each other, and side-chain interactions at the protein–protein interface, were simultaneously optimized to find the lowest energy conformation. The vastly increased computing power afforded by the Rosetta@home network, combined with revised ''fold-tree'' representations for backbone flexibility and
loop modeling, made RosettaDock sixth out of 63 prediction groups in the third CAPRI assessment.
Robetta
The Robetta (Rosetta Beta) server is an automated protein structure prediction service offered by the Baker laboratory for non-commercial ''ab initio'' and comparative modeling. It has participated as an automated prediction server in the biannual
CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
experiments since CASP5 in 2002, performing among the best in the automated server prediction category. Robetta has since competed in CASP6 and 7, where it did better than average among both automated server and human predictor groups.
It also participates in the
CAMEO3D continuous evaluation.
In modeling protein structure as of CASP6, Robetta first searches for structural homologs using
BLAST
Blast or The Blast may refer to:
*Explosion, a rapid increase in volume and release of energy in an extreme manner
*Detonation, an exothermic front accelerating through a medium that eventually drives a shock front
Film
* ''Blast'' (1997 film), ...
,
PSI-BLAST
In bioinformatics, BLAST (basic local alignment search tool) is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLA ...
, and
3D-Jury 3D-Jury is a metaserver {{unreferenced, date=July 2017
A MetaServer is a central broker providing a collated view (similar to a database view) for dispersed web resources. It is used to collect data from various web services, web pages, databases ...
, then parses the target sequence into its individual
domains, or independently folding units of proteins, by matching the sequence to structural families in the
Pfam database. Domains with structural homologs then follow a "template-based model" (i.e.,
homology modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "''target''" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous p ...
) protocol. Here, the Baker laboratory's in-house alignment program, K*sync, produces a group of sequence homologs, and each of these is modeled by the Rosetta ''de novo'' method to produce a decoy (possible structure). The final structure prediction is selected by taking the
lowest energy model as determined by a low-resolution Rosetta energy function. For domains that have no detected structural homologs, a ''de novo'' protocol is followed in which the lowest energy model from a set of generated decoys is selected as the final prediction. These domain predictions are then connected together to investigate inter-domain, tertiary-level interactions within the protein. Finally, side-chain contributions are modeled using a protocol for
Monte Carlo
Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino i ...
conformational search.
In CASP8, Robetta was augmented to use Rosetta's high resolution all-atom refinement method, the absence of which was cited as the main cause for Robetta being less accurate than the Rosetta@home network in CASP7.
In CASP11, a way to predict the
protein contact map by co-evolution of residues in related proteins called GREMLIN was added, allowing for more ''de novo'' fold successes.
Foldit
On May 9, 2008, after Rosetta@home users suggested an interactive version of the
volunteer computing
Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
program, the Baker lab publicly released
Foldit
Foldit is an online puzzle video game about protein folding. It is part of an experimental research project developed by the University of Washington, Center for Game Science, in collaboration with the UW Department of Biochemistry. The objective ...
, an online protein structure prediction game based on the Rosetta platform. , Foldit had over 59,000 registered users. The game gives users a set of controls (for example, shake, wiggle, rebuild) to manipulate the
backbone and amino acid
side chain
In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called the "main chain" or backbone. The side chain is a hydrocarbon branching element of a molecule that is attached to a ...
s of the target protein into more energetically favorable conformations. Users can work on solutions individually as ''soloists'' or collectively as ''evolvers'', accruing points under either category as they improve their structure predictions.
Comparison to similar volunteer computing projects
There are several volunteer computed projects which have study areas similar to those of Rosetta@home, but differ in their research approach:
Folding@home
Of all the major volunteer computing projects involved in protein research,
Folding@home
Folding@home (FAH or F@h) is a volunteer computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...
is the only one not using the
BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
platform. Both Rosetta@home and Folding@home study protein misfolding diseases such as
Alzheimer's disease, but Folding@home does so much more exclusively. Folding@home almost exclusively uses all-atom
molecular dynamics
Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of th ...
models to understand how and why proteins fold (or potentially misfold, and subsequently aggregate to cause diseases).
In other words, Folding@home's strength is modeling the process of protein folding, while Rosetta@home's strength is computing protein design and predicting protein structure and docking.
Some of Rosetta@home's results are used as the basis for some Folding@home projects. Rosetta provides the most likely structure, but it is not definite if that is the form the molecule takes or whether or not it is viable. Folding@home can then be used to verify Rosetta@home's results, and can provide added atomic-level information, and details of how the molecule changes shape.
The two projects also differ significantly in their computing power and host diversity. Averaging about 6,650 tera
FLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
from a host base of
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s (CPUs),
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mo ...
s (GPUs), and (formerly)
PS3s, Folding@home has nearly 108 times more computing power than Rosetta@home.
World Community Grid
Both Phase I and Phase II of the
Human Proteome Folding Project
The Human Proteome Folding Project (HPF) is a collaborative effort between New York University ( Bonneau Lab), the Institute for Systems Biology (ISB) and the University of Washington (Baker
A baker is a tradesperson who bakes and sometime ...
(HPF), a subproject of
World Community Grid, have used the Rosetta program to make structural and functional annotations of various
genomes
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding gen ...
. Although he now uses it to create databases for biologists,
Richard Bonneau, head scientist of the Human Proteome Folding Project, was active in the original development of Rosetta at David Baker's laboratory while obtaining his PhD. More information on the relationship between the HPF1, HPF2 and Rosetta@home can be found on Richard Bonneau's website.
Predictor@home
Like Rosetta@home,
Predictor@home
Predictor@home was a volunteer computing project that used BOINC software to predict protein structure from protein sequence in the context of the 6th biannual CASP, or Critical Assessment of Techniques for Protein Structure Prediction. A major g ...
specialized in protein structure prediction. While Rosetta@home uses the Rosetta program for its structure prediction, Predictor@home used the dTASSER methodology. In 2009, Predictor@home shut down.
Other protein related volunteer computing projects on
BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
include
QMC@home
QMC@Home was a volunteer computing project for the BOINC client aimed at further developing and testing Quantum Monte Carlo (QMC) for use in quantum chemistry. It is hosted by the University of Münster with participation by the Cavendish Laborato ...
,
Docking@home
Docking@Home was a volunteer computing project hosted by the University of Delaware and running on the Berkeley Open Infrastructure for Network Computing (BOINC) software platform. It models protein-ligand
In coordination chemistry, a l ...
,
POEM@home,
SIMAP, and
TANPAKU
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
. RALPH@home, the Rosetta@home
alpha project which tests new application versions, work units, and updates before they move on to Rosetta@home, runs on BOINC also.
Volunteer contributions
Rosetta@home depends on computing power donated by individual project members for its research. , about 53,000 users from 150 countries were active members of Rosetta@home, together contributing idle processor time from about 54,800 computers for a combined average performance of over 1.7 Peta
FLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
.

Users are granted
BOINC credits as a measure of their contribution. The credit granted for each workunit is the number of
decoys
A decoy (derived from the Dutch ''de'' ''kooi'', literally "the cage" or possibly ''ende kooi'', " duck cage") is usually a person, device, or event which resembles what an individual or a group might be looking for, but it is only meant to lur ...
produced for that workunit multiplied by the average claimed credit for the decoys submitted by all computer hosts for that workunit. This custom system was designed to address significant differences between credit granted to users with the standard BOINC client and an optimized BOINC client, and credit differences between users running Rosetta@home on
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
and
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
operating systems. The amount of credit granted per second of CPU work is lower for Rosetta@home than most other BOINC projects.
Rosetta@home is thirteenth out of over 40 BOINC projects in terms of total credit.
Rosetta@home users who predict protein structures submitted for the CASP experiment are acknowledged in scientific publications regarding their results.
Users who predict the lowest energy structure for a given workunit are featured on the Rosetta@home homepage as ''Predictor of the Day'', along with any team of which they are a member. A ''User of the Day'' is chosen randomly each day to be on the homepage also, from among users who have made a Rosetta@home profile.
References
External links
*
Baker LabBaker Lab website
David Baker's Rosetta@home journalBOINCIncludes platform overview, and a guide to install BOINC and attach to Rosetta@home
BOINCstats – Rosetta@homeDetailed contribution statistics
RALPH@homeWebsite for Rosetta@home alpha testing project
Rosetta@home video on YouTubeOverview of Rosetta@home given by David Baker and lab members
Rosetta CommonsAcademic collaborative for developing the Rosetta platform
Kuhlman lab webpage home of RosettaDesign
Online Rosetta services
Rosetta Commonslist of available servers
RobettaProtein structure prediction server
ROSIEDocking, design, etc. multifunctional server-set
RosettaDesignProtein design server
RosettaBackrubFlexible backbone / protein design server
{{DEFAULTSORT:Rosetta at Home
Science in society
Free science software
Volunteer computing projects
Bioinformatics software
Protein structure
Molecular modelling
Proprietary cross-platform software