The G-value paradox arises from the lack of correlation between the number of
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
-coding
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s among
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s and their relative biological complexity. The microscopic
nematode
The nematodes ( or ; ; ), roundworms or eelworms constitute the phylum Nematoda. Species in the phylum inhabit a broad range of environments. Most species are free-living, feeding on microorganisms, but many are parasitic. Parasitic worms (h ...
''
Caenorhabditis elegans
''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
'', for example, is composed of only a thousand
cells but has about the same number of genes as a human.
Researchers suggest resolution of the paradox may lie in mechanisms such as
alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
and complex
gene regulation
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
that make the genes of humans and other complex eukaryotes relatively more productive.
DNA and biological complexity
The lack of correlation between the
morphological complexity of eukaryotes and the amount of genetic information they carry has long puzzled researchers.
The sheer amount of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
in an organism, measured by the mass of DNA present in the
nucleus or the number of constituent
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
pairs, varies by several orders of magnitude among eukaryotes and often is unrelated to an organism's size or developmental complexity. One
amoeba
An amoeba (; less commonly spelled ameba or amœba; : amoebas (less commonly, amebas) or amoebae (amebae) ), often called an amoeboid, is a type of Cell (biology), cell or unicellular organism with the ability to alter its shape, primarily by ...
has 200 times more DNA per cell than humans, and even insects and plants within the same
genus
Genus (; : genera ) is a taxonomic rank above species and below family (taxonomy), family as used in the biological classification of extant taxon, living and fossil organisms as well as Virus classification#ICTV classification, viruses. In bino ...
can vary dramatically in their quantity of DNA. This
C-value paradox troubled genome scientists for many years.
Eventually, researchers recognized that not all DNA contributes directly to the production of proteins and other biological functions.
Susumu Ohno
was a Japanese-American geneticist and evolutionary biologist, and seminal researcher in the field of molecular evolution.
Biography
Susumu Ohno was born to Japanese parents in Keijō, Chōsen (present-day Seoul, South Korea), Empire of ...
coined the phrase "
junk DNA
Junk DNA (non-functional DNA) is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organ ...
" to describe these nonfunctional swaths of DNA. They include
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
s, genetic sequences that are removed after
transcription into
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
and thus are not
translated into proteins;
transposable element
A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome.
The discovery of mobile genetic elements earned Barbara McClinto ...
s that are mobile fragments of DNA, most of which are nonfunctional in humans;
and
pseudogene
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudogenes arise as superfluous copies of fun ...
s, nonfunctional DNA sequences that originated from functional genes. The share of the human genome that may be considered "junk" remains controversial. Estimates reach as low as 8% and as high as 80%, with one researcher arguing that there is a fixed ceiling of 15% imposed by the genome's
genetic load. (
Prokaryote
A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s, which have little "junk" DNA by comparison, exhibit a fairly close relationship between genome size and biological functionality).
In any case, the assumption was that once the C-paradox was swept away and the focus shifted to the number of protein-coding genes, the anticipated correlation between genetic information and biological complexity in eukaryotes would emerge.
Unfortunately, the G-value paradox simply picked up where the C-value paradox left off, because the discrepancy persisted when comparisons were narrowed to just protein-coding genes.
G-value paradox
Estimates of the number of coding genes in the human
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
reached upwards of 100,000 prior to the
human genome project
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
, but since have dwindled to as low as 19,000 following completion of that massive
sequencing
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
effort and subsequent refinements.
By comparison, the microscopic water flea
Daphnia pulex
''Daphnia pulex'' is the most common species of water flea. It has a cosmopolitan distribution: the species is found throughout the Americas, Europe, and Australia. It is a model species, and was the first crustacean to have its genome sequenced. ...
has about 31,000 genes; the nematode ''C. elegans'' about 19,700;
the fruit fly ''(
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
)'' about 14,000; the zebrafish (''
Danio rerio
The zebrafish (''Danio rerio'') is a species of freshwater ray-finned fish belonging to the family Danionidae of the order Cypriniformes. Native to South Asia, it is a popular aquarium fish, frequently sold under the trade name zebra danio (a ...
),'' 26,000; and the small flowering plant
''Arabidopsis'' ''thaliana'''','' 27,000. Plants in general tend to have more genes than other eukaryotes.
One explanation is their higher incidence of gene and whole genome duplication and retention of those additional genes, due in part to their development of a large collection of defensive
secondary metabolite
Secondary metabolites, also called ''specialised metabolites'', ''secondary products'', or ''natural products'', are organic compounds produced by any lifeform, e.g. bacteria, archaea, fungi, animals, or plants, which are not directly involved ...
s.
The apparent disconnect between the number of genes in a species and its biological complexity was dubbed the G-value paradox.
While the C-value paradox unraveled with the discovery of massive sequences of noncoding DNA, resolution of the G-value paradox appears to rest on differences in genome productivity. Humans and other complex eukaryotes simply may be able to do more with what they have, genetically speaking.
Among the mechanisms cited for this greater productivity are more sophisticated
transcriptional controls,
multifunctional proteins, more interaction between protein products, alternative splicing and
post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
s that may produce several protein products from the same genetic raw material.
In addition, thousands of
non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
s that are transcribed from DNA but not translated into protein have emerged as important regulators of gene expression and development in humans and other eukaryotes.
They include short RNA sequences, such as
microRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s (miRNAs),
small interfering RNA
Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA, double-stranded non-coding RNA, non-coding RNA, RNA molecules, typically 20–24 base pairs in length, similar to microR ...
s (siRNAs) and
Piwi-interacting RNA
Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA, non-coding RNA molecules expressed in animal cells. piRNAs form RNA-protein complexes through interactions with piwi-subfamily Argonaute proteins. These piRNA complexes are ...
s (piRNAs),
and
long, non-coding RNAs (lncRNA) that may regulate gene expression at different stages of development. Some researchers suggest that instead of the number of genes the focus now should shift to gene interactions and the network of genetic
regulatory mechanisms that allow them to support a variety of biological activities.
These transitions have taken analysis of genetic complexity from the C-value to the G-value to what some refer to as the I-value, a measure of the total information contained in a genome.
Defining complexity
One of the challenges in the long debate over the mismatch between genome size and biological complexity has been ambiguity in defining complexity. Is it the number of
cell types in an organism, the sophistication of its
nervous system
In biology, the nervous system is the complex system, highly complex part of an animal that coordinates its behavior, actions and sense, sensory information by transmitting action potential, signals to and from different parts of its body. Th ...
or the number of different proteins it produces?
By some definitions, the greater complexity of humans compared to other organisms may be illusory. Even once complexity is defined, some researchers argue complexity in function does not necessarily require the same complexity in process. Evolution is not a paragon of efficiency but travels a crooked path that leads to a more cumbersome genome than is necessary in some species.
References
{{Reflist
Genomics