AlphaFold is an
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
(AI) program developed by
DeepMind
DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
, a subsidiary of
Alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
, which performs
predictions of protein structure. It is designed using
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
techniques.
AlphaFold 1 (2018) placed first in the overall rankings of the 13th
Critical Assessment of Structure Prediction (CASP) in December 2018. It was particularly successful at predicting the most accurate structures for targets rated as most difficult by the competition organizers, where no existing
template structures were available from
proteins
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, re ...
with partially similar sequences.
AlphaFold 2 (2020) repeated this placement in the CASP14 competition in November 2020.
It achieved a level of accuracy much higher than any other entry.
It scored above 90 on CASP's
global distance test (GDT) for approximately two-thirds of the proteins, a test measuring the similarity between a computationally predicted structure and the experimentally determined structure, where 100 represents a complete match.
[Robert F. Service]
'The game has changed.' AI triumphs at solving protein structures
, ''Science
Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
'', 30 November 2020 The inclusion of
metagenomic data has improved the quality of the prediction of
MSAs. One of the biggest sources of the training data was the custom-built Big Fantastic Database (BFD) of 65,983,866 protein families, represented as MSAs and hidden Markov models (HMMs), covering 2,204,359,010 protein sequences from reference databases, metagenomes, and metatranscriptomes.
AlphaFold 2's results at CASP14 were described as "astounding"
and "transformational".
However, some researchers noted that the accuracy was insufficient for a third of its predictions, and that it did not reveal the underlying mechanism or rules of
protein folding
Protein folding is the physical process by which a protein, after Protein biosynthesis, synthesis by a ribosome as a linear chain of Amino acid, amino acids, changes from an unstable random coil into a more ordered protein tertiary structure, t ...
for the
protein folding problem, which remains unsolved.
[Stephen Curry]
No, DeepMind has not solved protein folding
, Reciprocal Space (blog), 2 December 2020
Despite this, the technical achievement was widely recognized. On 15 July 2021, the AlphaFold 2 paper was published in ''
Nature
Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'' as an advance access publication alongside
open source software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
and a searchable database of species
proteome
A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
s.
As of February 2025, the paper had been cited nearly 35,000 times.
AlphaFold 3 was announced on 8 May 2024. It can predict the structure of
complexes created by proteins with
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
,
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, various
ligand
In coordination chemistry, a ligand is an ion or molecule with a functional group that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's el ...
s, and
ion
An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by convent ...
s.
The new prediction method shows a minimum 50% improvement in accuracy for protein interactions with other molecules compared to existing methods. Moreover, for certain key categories of interactions, the prediction accuracy has effectively doubled.
Demis Hassabis and
John Jumper of
Google DeepMind
DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Goo ...
shared one half of the 2024
Nobel Prize in Chemistry
The Nobel Prize in Chemistry () is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outst ...
, awarded "for protein structure prediction," while the other half went to
David Baker "for computational protein design."
Hassabis and Jumper had previously won the
Breakthrough Prize in Life Sciences and the
Albert Lasker Award for Basic Medical Research
The Albert Lasker Award for Basic Medical Research is one of the Lasker Award, prizes awarded by the Lasker Foundation for a fundamental discovery that opens up a new area of biomedical science. The award frequently precedes a Nobel Prize in Phys ...
in 2023 for their leadership of the AlphaFold project.
Background
Protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s consist of
chains of amino acids which
spontaneously fold to form the
three dimensional (3-D) structures of the proteins. The 3-D structure is crucial to understanding the biological function of the protein.
Protein structures can be determined experimentally through techniques such as
X-ray crystallography
X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
,
cryo-electron microscopy
Cryogenic electron microscopy (cryo-EM) is a transmission electron microscopy technique applied to samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An ...
and
nuclear magnetic resonance
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
, which are all expensive and time-consuming.
Such efforts, using the experimental methods, have identified the structures of about 170,000 proteins over the last 60 years, while there are over 200 million known proteins across all life forms.
[
Over the years, researchers have applied numerous computational methods to predict the 3D structures of proteins from their amino acid sequences, accuracy of such methods in best possible scenario is close to experimental techniques (NMR) by the use of ]homology modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "''target''" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous pr ...
based on molecular evolution. CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP pro ...
, which was launched in 1994 to challenge the scientific community to produce their best protein structure predictions, found that GDT scores of only about 40 out of 100 can be achieved for the most difficult proteins by 2016.[ AlphaFold started competing in the 2018 CASP using an ]artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
(AI) deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
technique.
Algorithm
DeepMind is known to have trained the program on over 170,000 proteins from the Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
, a public repository of protein sequences and structures. The program uses a form of attention network, a deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution.[ The overall training was conducted on processing power between 100 and 200 ]GPUs
A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
.[
]
AlphaFold 1 (2018)
AlphaFold 1 (2018) was built on work developed by various teams in the 2010s, work that looked at the large databanks of related DNA sequences now available from many different organisms (most without known 3D structures), to try to find changes at different residues (peptides) that appeared to be correlated, even though the residues were not consecutive in the main chain. Such correlations suggest that the residues may be close to each other physically, even though not close in the sequence, allowing a contact map to be estimated. Building on recent work prior to 2018, AlphaFold 1 extended this by estimating a probability distribution for the distances between residues, effectively transforming the contact map into a distance map. It also used more advanced learning methods than previously to develop the inference.
AlphaFold 2 (2020)
The 2020 version of the program (AlphaFold 2, 2020) is significantly different from the original version that won CASP 13 in 2018, according to the team at DeepMind.[Jeremy Kahn]
Lessons from DeepMind's breakthrough in protein-folding A.I.
, ''Fortune
Fortune may refer to:
General
* Fortuna or Fortune, the Roman goddess of luck
* Luck
* Wealth
* Fate
* Fortune, a prediction made in fortune-telling
* Fortune, in a fortune cookie
Arts and entertainment Film and television
* ''The Fortune'' (19 ...
'', 1 December 2020
AlphaFold 1 used a number of separately trained modules to produce a guide potential, which was then combined with a physics-based energy potential. AlphaFold 2 replaced this with a system of interconnected sub-networks, forming a single, differentiable, end-to-end model based on pattern recognition. This model was trained in an integrated manner.[See block diagram. Also John Jumper ''et al.'' (1 December 2020)]
AlphaFold 2 presentation
, slide 10 After the neural network's prediction converges, a final refinement step applies local physical constraints using energy minimization based on the AMBER
Amber is fossilized tree resin. Examples of it have been appreciated for its color and natural beauty since the Neolithic times, and worked as a gemstone since antiquity."Amber" (2004). In Maxine N. Lurie and Marc Mappen (eds.) ''Encyclopedia ...
force field. This step only slightly adjusts the predicted structure.[John Jumper et al., conference abstract (December 2020)]
A key part of the 2020 system are two modules, believed to be based on a transformer
In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
design, which are used to progressively refine a vector of information for each relationship (or "edge
Edge or EDGE may refer to:
Technology Computing
* Edge computing, a network load-balancing system
* Edge device, an entry point to a computer network
* Adobe Edge, a graphical development application
* Microsoft Edge, a web browser developed by ...
" in graph-theory terminology) between an amino acid residue of the protein and another amino acid residue (these relationships are represented by the array shown in green); and between each amino acid position and each different sequences in the input sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
(these relationships are represented by the array shown in red). Internally these refinement transformations contain layers that have the effect of bringing relevant data together and filtering out irrelevant data (the "attention mechanism") for these relationships, in a context-dependent way, learnt from training data. These transformations are iterated, the updated information output by one step becoming the input of the next, with the sharpened residue/residue information feeding into the update of the residue/sequence information, and then the improved residue/sequence information feeding into the update of the residue/residue information. As the iteration progresses, according to one report, the "attention algorithm ... mimics the way a person might assemble a jigsaw puzzle: first connecting pieces in small clumps—in this case clusters of amino acids—and then searching for ways to join the clumps in a larger whole."
The output of these iterations then informs the final structure prediction module, which also uses transformers, and is itself then iterated. In an example presented by DeepMind, the structure prediction module achieved a correct topology for the target protein on its first iteration, scored as having a GDT_TS of 78, but with a large number (90%) of stereochemical violations – i.e. unphysical bond angles or lengths. With subsequent iterations the number of stereochemical violations fell. By the third iteration the GDT_TS of the prediction was approaching 90, and by the eighth iteration the number of stereochemical violations was approaching zero.[John Jumper ''et al.'' (1 December 2020)]
AlphaFold 2 presentation
, slides 12 to 20
The training data was originally restricted to single peptide chains. However, the October 2021 update, named AlphaFold-Multimer, included protein complexes in its training data. DeepMind stated this update succeeded about 70% of the time at accurately predicting protein-protein interactions.
AlphaFold 3 (2024)
Announced on 8 May 2024, AlphaFold 3 was co-developed by Google DeepMind and Isomorphic Labs, both subsidiaries of Alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
. AlphaFold 3 is not limited to single-chain proteins, as it can also predict the structures of protein complex
A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple active site, catalytic domains are found in a single polypeptide chain.
...
es with DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
, RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
s and selected ligand
In coordination chemistry, a ligand is an ion or molecule with a functional group that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's el ...
s and ion
An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by convent ...
s.
AlphaFold 3 introduces the "Pairformer," a deep learning architecture inspired by the transformer, which is considered similar to, but simpler than, the Evoformer used in AlphaFold 2. The Pairformer module's initial predictions are refined by a diffusion model. This model begins with a cloud of atoms and iteratively refines their positions, guided by the Pairformer's output, to generate a 3D representation of the molecular structure.
The AlphaFold server was created to provide free access to AlphaFold 3 for non-commercial research. As of May 2025, the AlphaFold 3 research paper has been directly cited more than 4000 times.
Competitions
CASP13
In December 2018, DeepMind's AlphaFold placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP).
The program was particularly successfully predicting the most accurate structure for targets rated as the most difficult by the competition organisers, where no existing template structures were available from proteins with a partially similar sequence. AlphaFold gave the best prediction for 25 out of 43 protein targets in this class, achieving a median score of 58.9 on the CASP's global distance test (GDT) score, ahead of 52.5 and 52.4 by the two next best-placed teams, who were also using deep learning to estimate contact distances. Overall, across all targets, AlphaFold 1 achieved a GDT score of 68.5.
In January 2020, implementations and illustrative code of AlphaFold 1 was released open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
on GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. but, as stated in the "Read Me" file on that website: "This code can't be used to predict structure of an arbitrary protein sequence. It can be used to predict structure only on the CASP13 dataset (links below). The feature generation code is tightly coupled to our internal infrastructure as well as external tools, hence we are unable to open-source it." Therefore, in essence, the code deposited is not suitable for general use but only for the CASP13 proteins. The company has not announced plans to make their code publicly available as of 5 March 2021.
CASP14
In November 2020, DeepMind's new version, AlphaFold 2, won CASP14. Overall, AlphaFold 2 made the best prediction for 88 out of the 97 targets.
On the competition's preferred global distance test (GDT) measure of accuracy, the program achieved a median score of 92.4 (out of 100), meaning that more than half of its predictions were scored at better than 92.4% for having their atoms in more-or-less the right place, a level of accuracy reported to be comparable to experimental techniques like X-ray crystallography
X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
. In 2018 AlphaFold 1 had only reached this level of accuracy in two of all of its predictions. 88% of predictions in the 2020 competition had a GDT_TS score of more than 80. On the group of targets classed as the most difficult, AlphaFold 2 achieved a median score of 87.
Measured by the root-mean-square deviation
The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on th ...
(RMS-D) of the placement of the alpha-carbon atoms of the protein backbone chain, which tends to be dominated by the performance of the worst-fitted outliers, 88% of AlphaFold 2's predictions had an RMS deviation of less than 4 Å for the set of overlapped C-alpha atoms. 76% of predictions achieved better than 3 Å, and 46% had a C-alpha atom RMS accuracy better than 2 Å,[Mohammed AlQuraishi]
CASP14 scores just came out and they're astounding
, Twitter, 30 November 2020. with a median RMS deviation in its predictions of 2.1 Å for a set of overlapped CA atoms. AlphaFold 2 also achieved an accuracy in modelling surface side chain
In organic chemistry and biochemistry, a side chain is a substituent, chemical group that is attached to a core part of the molecule called the "main chain" or backbone chain, backbone. The side chain is a hydrocarbon branching element of a mo ...
s described as "really really extraordinary".
To further validate AlphaFold 2, the conference organizers approached four leading experimental groups working on structures they found particularly challenging and had been unable to determine. In all four cases the three-dimensional models produced by AlphaFold 2 were sufficiently accurate to determine structures of these proteins by molecular replacement
Molecular replacement (MR) is a method of solving the phase problem in X-ray crystallography. MR relies upon the existence of a previously solved protein structure which is similar to our unknown structure from which the diffraction data is deriv ...
. These included target T1100 (Af1503), a small membrane protein
Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane ...
studied by experimentalists for ten years.[
Of the three structures that AlphaFold 2 had the least success in predicting, two had been obtained by ]protein NMR
Nuclear magnetic resonance spectroscopy of proteins (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and ...
methods, which define protein structure directly in aqueous solution, whereas AlphaFold was mostly trained on protein structures in crystals. The third exists in nature as a multidomain complex consisting of 52 identical copies of the same domain, a situation AlphaFold was not programmed to consider. For all targets with a single domain, excluding only one very large protein and the two structures determined by NMR, AlphaFold 2 achieved a GDT_TS score of over 80.
CASP15
In 2022, DeepMind did not enter CASP15, but most of the entrants used AlphaFold or tools incorporating AlphaFold.
Reception
AlphaFold 2 scoring more than 90 in CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP pro ...
's global distance test (GDT) is considered a significant achievement in computational biology
Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...
[ and great progress towards a decades-old grand challenge of biology.] Nobel Prize
The Nobel Prizes ( ; ; ) are awards administered by the Nobel Foundation and granted in accordance with the principle of "for the greatest benefit to humankind". The prizes were first awarded in 1901, marking the fifth anniversary of Alfred N ...
winner and structural biologist Venki Ramakrishnan
Venkatraman Ramakrishnan (born 1952) is a British-American structural biologist. He shared the 2009 Nobel Prize in Chemistry with Thomas A. Steitz and Ada Yonath for research on the structure and function of ribosomes.
Since 1999, he has wo ...
called the result "a stunning advance on the protein folding problem",[ adding that "It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research."]
Propelled by press releases from CASP and DeepMind,[Artificial intelligence solution to a 50-year-old science challenge could 'revolutionise' medical research]
(press release), CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP pro ...
organising committee, 30 November 2020 AlphaFold 2's success received wide media attention. As well as news pieces in the specialist science press, such as ''Nature
Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'', ''Science
Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
'',[ '']MIT Technology Review
''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology. It was founded in 1899 as ''The Technology Review'', and was re-launched without "''The''" in its name on April 23, 1998, under then pu ...
'',[ and '']New Scientist
''New Scientist'' is a popular science magazine covering all aspects of science and technology. Based in London, it publishes weekly English-language editions in the United Kingdom, the United States and Australia. An editorially separate organ ...
'', the story was widely covered by major national newspapers. A frequent theme was that ability to predict protein structures accurately based on the constituent amino acid sequence is expected to have a wide variety of benefits in the life sciences space including accelerating advanced drug discovery and enabling better understanding of diseases. Some have noted that even a perfect answer to the protein ''prediction
A prediction (Latin ''præ-'', "before," and ''dictum'', "something said") or forecast is a statement about a future event or about future data. Predictions are often, but not always, based upon experience or knowledge of forecasters. There ...
'' problem would still leave questions about the protein '' folding'' problem—understanding in detail how the folding process actually occurs in nature (and how sometimes they can also misfold).
In 2023, Demis Hassabis and John Jumper won the Breakthrough Prize in Life Sciences as well as the Albert Lasker Award for Basic Medical Research
The Albert Lasker Award for Basic Medical Research is one of the Lasker Award, prizes awarded by the Lasker Foundation for a fundamental discovery that opens up a new area of biomedical science. The award frequently precedes a Nobel Prize in Phys ...
for their management of the AlphaFold project. Hassabis and Jumper proceeded to win the Nobel Prize in Chemistry
The Nobel Prize in Chemistry () is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outst ...
in 2024 for their work on “protein structure prediction” with David Baker of the University of Washington.
Source code
Open access to source code of several AlphaFold versions (excluding AlphaFold 3) has been provided by DeepMind after requests from the scientific community.[ Demis Hassabis]
"Brief update on some exciting progress on #AlphaFold!"
(tweet), via twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
, 18 June 2021 The source code of AlphaFold 3 was made available for non-commercial use to the scientific community upon request in November 2024.
Database of protein models generated by AlphaFold
The AlphaFold Protein Structure Database, a joint project between AlphaFold and EMBL-EBI, was launched on July 22, 2021. At launch, the database contained AlphaFold-predicted models
A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , .
Models can be divided int ...
for nearly the complete UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
proteome
A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
of humans and 20 model organisms
A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
, totaling over 365,000 proteins. The database does not include proteins with fewer than 16 or more than 2700 amino acid residues
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid m ...
, but for humans they are available in the whole batch file. AlphaFold's initial goal (as of early 2022) was to expand the database to cover most of the UniRef90 set, which contains over 100 million proteins. As of May 15, 2022, the database contained 992,316 predictions.
In July 2021, UniProt-KB and InterPro has been updated to show AlphaFold predictions when available.
On July 28, 2022, the team uploaded to the database the structures of around 200 million proteins from 1 million species, covering nearly every known protein on the planet.
Limitations
AlphaFold has various limitations:
*AlphaFold DB provides models of individual protein chains (monomers), rather than their biologically relevant complexes.
* Many protein regions are predicted with low confidence score, including the intrinsically disordered protein regions.
* Alphafold-2 was validated for predicting structural effects of mutations with a limited success.
* The model relies, to some extent, on co-evolutionary information from similar proteins. Therefore, it may not perform as well on synthetic proteins or proteins with very low homology to those in the training database.
* The model's ability to predict multiple native
Native may refer to:
People
* '' Jus sanguinis'', nationality by blood
* '' Jus soli'', nationality by location of birth
* Indigenous peoples, peoples with a set of specific rights based on their historical ties to a particular territory
** Nat ...
conformations of proteins is limited.
* AlphaFold 3 version can predict structures of protein complexes with a very limited set of selected cofactors and co- and post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
s. Between 50% and 70% of the structures of the human proteome are incomplete without covalently-attached glycans.[ AlphaFill, a derived database, adds cofactors to AlphaFold models where appropriate.
* In the algorithm, the residues are moved freely, without any restraints. Therefore, during modeling the integrity of the chain is not maintained. As a result, AlphaFold may produce topologically wrong results, like structures with an arbitrary number of knots.
]
Applications
AlphaFold has been used to predict structures of proteins of SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
, the causative agent of COVID-19
Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the coronavirus SARS-CoV-2. In January 2020, the disease spread worldwide, resulting in the COVID-19 pandemic.
The symptoms of COVID‑19 can vary but often include fever ...
. The structures of these proteins were pending experimental detection in early 2020. Results were reviewed by scientists at the Francis Crick Institute
The Francis Crick Institute (formerly the UK Centre for Medical Research and Innovation) is a biomedical research centre in London, which was established in 2010 and opened in 2016. The institute is a partnership between Cancer Research UK, Im ...
in the United Kingdom before being released to the broader research community. The team also confirmed accurate prediction against the experimentally determined SARS-CoV-2 spike protein
In virology, a spike protein or peplomer protein is a protein that forms a large structure known as a spike or peplomer projecting from the surface of an viral envelope, enveloped virus. as cited in The proteins are usually glycoproteins that ...
that was shared in the Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
, an international open-access database, before releasing the computationally determined structures of the under-studied protein molecules. The team acknowledged that although these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus. Specifically, AlphaFold 2's prediction of the structure of the '' ORF3a'' protein was very similar to the structure determined by researchers at University of California, Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a Public university, public Land-grant university, land-grant research university in Berkeley, California, United States. Founded in 1868 and named after t ...
using cryo-electron microscopy
Cryogenic electron microscopy (cryo-EM) is a transmission electron microscopy technique applied to samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An ...
. This specific protein is believed to assist the virus in breaking out of the host cell once it replicates. This protein is also believed to play a role in triggering the inflammatory response to the infection.
Published works
* Andrew W. Senior ''et al.'' (December 2019)
"Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13)"
''Proteins: Structure, Function, Bioinformatics'' 87(12) 1141–1148
* Andrew W. Senior ''et al.'' (15 January 2020)
"Improved protein structure prediction using potentials from deep learning"
''Nature
Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'' 577 706–710
* John Jumper ''et al.'' (December 2020), "High Accuracy Protein Structure Prediction Using Deep Learning", in
Fourteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstract Book)
', pp. 22–24
* John Jumper ''et al.'' (December 2020),
AlphaFold 2
. Presentation given at CASP 14.
* Abramson, J., Adler, J., Dunger, J. et al. (May 2024),
Accurate structure prediction of biomolecular interactions with AlphaFold 3
, Nature 630, 493–500 (2024)
See also
* Folding@home
Folding@home (FAH or F@h) is a distributed computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...
* IBM Blue Gene
*Foldit
Foldit is an online puzzle video game about protein folding. It is part of an experimental research project developed by the University of Washington, Center for Game Science, in collaboration with the UW Department of Biochemistry. The objectiv ...
* Rosetta@home
* Human Proteome Folding Project
* AlphaZero
AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero.
On December 5, 2017, the DeepMind ...
* AlphaGo
* AlphaGeometry
* Predicted Aligned Error
References
Further reading
* Carlos Outeiral
CASP14: what Google DeepMind's AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics
Oxford Protein Informatics Group. (3 December)
* Mohammed AlQuraishi
AlphaFold2 @ CASP14: "It feels like one's child has left home."
(blog), 8 December 2020
* Mohammed AlQuraishi
The AlphaFold2 Method Paper: A Fount of Good Ideas
(blog), 25 July 2021
External links
AlphaFold-3 web server
*
Open access to protein structure predictions for the human proteome and 20 other key organisms
at European Bioinformatics Institute
The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wel ...
(AlphaFold Protein Structure Database)
CASP 14
website
AlphaFold: The making of a scientific breakthrough
DeepMind, via YouTube.
ColabFold
()
version
for homooligomeric prediction and complexes
{{Artificial intelligence navbox
Bioinformatics software
Applied machine learning
Protein folding
Deep learning software applications
Molecular modelling software
Google DeepMind