In
evolutionary biology
Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life fo ...
and
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
, the error threshold (or critical mutation rate) is a limit on the number of
base pairs
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
a self-replicating molecule may have before mutation will destroy the information in subsequent generations of the molecule. The error threshold is crucial to understanding "Eigen's paradox".
The error threshold is a concept in the origins of life (
abiogenesis
In biology, abiogenesis (from a- 'not' + Greek bios 'life' + genesis 'origin') or the origin of life is the natural process by which life has arisen from non-living matter, such as simple organic compounds. The prevailing scientific hypothe ...
), in particular of very early life, before the advent of
DNA. It is postulated that the first self-replicating molecules might have been small
ribozyme
Ribozymes (ribonucleic acid enzymes) are RNA molecules that have the ability to catalyze specific biochemical reactions, including RNA splicing in gene expression, similar to the action of protein enzymes. The 1982 discovery of ribozymes demonst ...
-like
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
molecules. These molecules consist of strings of base pairs or "digits", and their order is a code that directs how the molecule interacts with its environment. All replication is subject to mutation error. During the replication process, each digit has a certain probability of being replaced by some other digit, which changes the way the molecule interacts with its environment, and may increase or decrease its fitness, or ability to reproduce, in that environment.
Fitness landscape
It was noted by
Manfred Eigen
Manfred Eigen (; 9 May 1927 – 6 February 2019) was a German biophysical chemist who won the 1967 Nobel Prize in Chemistry for work on measuring fast chemical reactions.
Eigen's research helped solve major problems in physical chemistry and ...
in his 1971 paper (Eigen 1971) that this mutation process places a limit on the number of digits a molecule may have. If a molecule exceeds this critical size, the effect of the mutations becomes overwhelming and a runaway mutation process will destroy the information in subsequent generations of the molecule. The error threshold is also controlled by the "fitness landscape" for the molecules. The fitness landscape is characterized by the two concepts of height (=fitness) and distance (=number of mutations). Similar molecules are "close" to each other, and molecules that are fitter than others and more likely to reproduce, are "higher" in the landscape.
If a particular sequence and its neighbors have a high fitness, they will form a
quasispecies
The quasispecies model is a description of the process of the Darwinian evolution of certain self-replicating entities within the framework of physical chemistry. A quasispecies is a large group or "cloud" of related genotypes that exist in an en ...
and will be able to support longer sequence lengths than a fit sequence with few fit neighbors, or a less fit neighborhood of sequences. Also, it was noted by Wilke (Wilke 2005) that the error threshold concept does not apply in portions of the landscape where there are lethal mutations, in which the induced mutation yields zero fitness and prohibits the molecule from reproducing.
Eigen's paradox
Eigen's paradox is one of the most intractable puzzles in the study of the origins of life. It is thought that the error threshold concept described above limits the size of self replicating molecules to perhaps a few hundred digits, yet almost all life on earth requires much longer molecules to encode their genetic information. This problem is handled in living cells by enzymes that repair mutations, allowing the encoding molecules to reach sizes on the order of millions of base pairs. These large molecules must, of course, encode the very enzymes that repair them, and herein lies Eigen's paradox, first put forth by
Manfred Eigen
Manfred Eigen (; 9 May 1927 – 6 February 2019) was a German biophysical chemist who won the 1967 Nobel Prize in Chemistry for work on measuring fast chemical reactions.
Eigen's research helped solve major problems in physical chemistry and ...
in his 1971 paper (Eigen 1971).
Simply stated, Eigen's paradox amounts to the following:
* Without error correction enzymes, the maximum size of a replicating molecule is about 100 base pairs.
* For a replicating molecule to encode error correction enzymes, it must be substantially larger than 100 bases.
This is a
chicken-or-egg kind of a paradox, with an even more difficult solution. Which came first, the large genome or the error correction enzymes? A number of solutions to this paradox have been proposed:
* Stochastic corrector model (Szathmáry & Maynard Smith, 1995). In this proposed solution, a number of primitive molecules of say, two different types, are associated with each other in some way, perhaps by a capsule or "cell wall". If their reproductive success is enhanced by having, say, equal numbers in each cell, and reproduction occurs by division in which each of various types of molecules are randomly distributed among the "children", the process of selection will promote such equal representation in the cells, even though one of the molecules may have a selective advantage over the other.
* Relaxed error threshold (Kun et al., 2005) - Studies of actual ribozymes indicate that the mutation rate can be substantially less than first expected - on the order of 0.001 per base pair per replication. This may allow sequence lengths of the order of 7-8 thousand base pairs, sufficient to incorporate rudimentary error correction enzymes.
A simple mathematical model
Consider a 3-digit molecule
,B,Cwhere A, B, and C can take on the values 0 and 1. There are eight such sequences (
00 01 10 11 00 01 10 and
11. Let's say that the
00molecule is the most fit; upon each replication it produces an average of
copies, where
. This molecule is called the "master sequence". The other seven sequences are less fit; they each produce only 1 copy per replication. The replication of each of the three digits is done with a mutation rate of μ. In other words, at every replication of a digit of a sequence, there is a probability
that it will be erroneous; 0 will be replaced by 1 or vice versa. Let's ignore double mutations and the death of molecules (the population will grow infinitely), and divide the eight molecules into three classes depending on their
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
from the master sequence:
:
Note that the number of sequences for distance ''d'' is just the
binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
for L=3, and that each sequence can be visualized as the vertex of an L=3 dimensional cube, with each edge of the cube specifying a mutation path in which the change Hamming distance is either zero or ±1. It can be seen that, for example, one third of the mutations of the
01molecules will produce
00molecules, while the other two thirds will produce the class 2 molecules
11and
01 We can now write the expression for the child populations
of class ''i'' in terms of the parent populations
.
:
where the matrix w''’ that incorporates natural selection and mutation, according to
quasispecies model
The quasispecies model is a description of the process of the Darwinian evolution of certain self-replicating entities within the framework of physical chemistry. A quasispecies is a large group or "cloud" of related genotypes that exist in an envi ...
, is given by:
:
where
is the probability that an entire molecule will be replicated successfully. The
eigenvectors
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...
of the w matrix will yield the equilibrium population numbers for each class. For example, if the mutation rate μ is zero, we will have Q=1, and the equilibrium concentrations will be