HOME

TheInfoList



OR:

In
biology Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, History of life, origin, evolution, and ...
, the word gene has two meanings. The Mendelian gene is a basic unit of
heredity Heredity, also called inheritance or biological inheritance, is the passing on of traits from parents to their offspring; either through asexual reproduction or sexual reproduction, the offspring cells or organisms acquire the genetic infor ...
. The molecular gene is a sequence of
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s in
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
that is transcribed to produce a functional
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
. There are two types of molecular genes: protein-coding genes and non-coding genes. During
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
(the synthesis of RNA or protein from a gene), DNA is first copied into RNA. RNA can be directly functional or be the intermediate
template Template may refer to: Tools * Die (manufacturing), used to cut or shape material * Mold, in a molding process * Stencil, a pattern or overlay used in graphic arts (drawing, painting, etc.) and sewing to replicate letters, shapes or designs C ...
for the synthesis of a protein. The transmission of genes to an organism's
offspring In biology, offspring are the young creation of living organisms, produced either by sexual reproduction, sexual or asexual reproduction. Collective offspring may be known as a brood or progeny. This can refer to a set of simultaneous offspring ...
, is the basis of the inheritance of
phenotypic trait A phenotypic trait, simply trait, or character state is a distinct variant of a phenotypic characteristic of an organism; it may be either inherited or determined environmentally, but typically occurs as a combination of the two.Lawrence, Eleano ...
s from one generation to the next. These genes make up different DNA sequences, together called a
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
, that is specific to every given individual, within the
gene pool The gene pool is the set of all genes, or genetic information, in any population, usually of a particular species. Description A large gene pool indicates extensive genetic diversity, which is associated with robust populations that can survi ...
of the
population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
of a given
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
. The genotype, along with environmental and developmental factors, ultimately determines the
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
of the individual. Most biological traits occur under the combined influence of
polygene A polygene is a member of a group of non- epistatic genes that interact additively to influence a phenotypic trait, thus contributing to multiple-gene inheritance (polygenic inheritance, multigenic inheritance, quantitative inheritance), a type o ...
s (a set of different genes) and
gene–environment interaction Gene–environment interaction (or genotype–environment interaction or G×E) is when two different genotypes respond to environmental variation in different ways. A norm of reaction is a graph that shows the relationship between genes and envir ...
s. Some genetic traits are instantly visible, such as
eye color Eye color is a polygene, polygenic phenotypic trait determined by two factors: the pigmentation of the eye's Iris (anatomy), iris and the frequency-dependence of the scattering of light by the Turbidity, turbid medium in the Stroma of iris, str ...
or the number of limbs, others are not, such as
blood type A blood type (also known as a blood group) is based on the presence and absence of antibody, antibodies and Heredity, inherited antigenic substances on the surface of red blood cells (RBCs). These antigens may be proteins, carbohydrates, glycop ...
, the risk for specific diseases, or the thousands of basic
biochemical Biochemistry, or biological chemistry, is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology, ...
processes that constitute
life Life, also known as biota, refers to matter that has biological processes, such as Cell signaling, signaling and self-sustaining processes. It is defined descriptively by the capacity for homeostasis, Structure#Biological, organisation, met ...
. A gene can acquire
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s in its
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
, leading to different variants, known as
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s, in the
population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
. These alleles encode slightly different versions of a gene, which may cause different
phenotypical In genetics, the phenotype () is the set of observable characteristics or phenotypic trait, traits of an organism. The term covers the organism's morphology (biology), morphology (physical form and structure), its Developmental biology, develo ...
traits. Genes evolve due to
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
or
survival of the fittest "Survival of the fittest" is a phrase that originated from Darwinian evolutionary theory as a way of describing the mechanism of natural selection. The biological concept of fitness is defined as reproductive success. In Darwinian terms, th ...
and
genetic drift Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene va ...
of the alleles.


Definitions

There are many different ways to use the term "gene" based on different aspects of their inheritance, selection, biological function, or molecular structure but most of these definitions fall into two categories, the Mendelian gene or the molecular gene. The Mendelian gene is the classical gene of genetics and it refers to any heritable trait. This is the gene described in ''The Selfish Gene''. More thorough discussions of this version of a gene can be found in the articles ''
Genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
'' and ''
Gene-centered view of evolution The gene-centered view of evolution, gene's eye view, gene selection theory, or selfish gene theory holds that adaptive evolution occurs through the differential survival of competing genes, increasing the allele frequency of those alleles wh ...
''. The molecular gene definition is more commonly used across biochemistry, molecular biology, and most of genetics—the gene that is described in terms of DNA sequence. There are many different definitions of this gene—some of which are misleading or incorrect. Very early work in the field that became
molecular genetics Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the st ...
suggested the concept that one gene makes one protein (originally 'one gene – one enzyme'). However, genes that produce repressor RNAs were proposed in the 1950s and by the 1960s, textbooks were using molecular gene definitions that included those that specified functional RNA molecules such as ribosomal RNA and tRNA (noncoding genes) as well as protein-coding genes. This idea of two kinds of genes is still part of the definition of a gene in most textbooks. For example, The important parts of such definitions are: (1) that a gene corresponds to a transcription unit; (2) that genes produce both mRNA and noncoding RNAs; and (3) regulatory sequences control gene expression but are not part of the gene itself. However, there is one other important part of the definition and it is emphasized in Kostas Kampourakis' book ''Making Sense of Genes''. The emphasis on function is essential because there are stretches of DNA that produce non-functional transcripts and they do not qualify as genes. These include obvious examples such as transcribed pseudogenes as well as less obvious examples such as junk RNA produced as noise due to transcription errors. In order to qualify as a true gene, by this definition, one has to prove that the transcript has a biological function. Early speculations on the size of a typical gene were based on high-resolution genetic mapping and on the size of proteins and RNA molecules. A length of 1500 base pairs seemed reasonable at the time (1965). This was based on the idea that the gene was the DNA that was directly responsible for production of the functional product. The discovery of introns in the 1970s meant that many eukaryotic genes were much larger than the size of the functional product would imply. Typical mammalian protein-coding genes, for example, are about 62,000 base pairs in length (transcribed region) and since there are about 20,000 of them they occupy about 35–40% of the mammalian genome (including the human genome). In spite of the fact that both protein-coding genes and noncoding genes have been known for more than 50 years, there are still a number of textbooks, websites, and scientific publications that define a gene as a DNA sequence that specifies a protein. In other words, the definition is restricted to protein-coding genes. Here is an example from a 2021 article in American Scientist. This restricted definition is so common that it has spawned many recent articles that criticize this "standard definition" and call for a new expanded definition that includes noncoding genes. However, some modern writers still do not acknowledge noncoding genes although this so-called "new" definition has been recognised for more than half a century. Although some definitions can be more broadly applicable than others, the fundamental complexity of biology means that no definition of a gene can capture all aspects perfectly. Not all genomes are DNA (e.g.
RNA virus An RNA virus is a virus characterized by a ribonucleic acid (RNA) based genome. The genome can be single-stranded RNA (ssRNA) or double-stranded (Double-stranded RNA, dsRNA). Notable human diseases caused by RNA viruses include influenza, SARS, ...
es), bacterial
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
s are multiple protein-coding regions transcribed into single large mRNAs,
alternative splicing Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
enables a single genomic region to encode multiple district products and
trans-splicing ''Trans''-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archa ...
concatenates mRNAs from shorter coding sequence across the genome. Since molecular definitions exclude elements such as introns, promotors, and other
regulatory regions A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the Gene expression, expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living o ...
, these are instead thought of as "associated" with the gene and affect its function. An even broader operational definition is sometimes used to encompass the complexity of these diverse phenomena, where a gene is defined as a union of genomic sequences encoding a coherent set of potentially overlapping functional products. This definition categorizes genes by their functional products (proteins or RNA) rather than their specific DNA loci, with regulatory elements classified as ''gene-associated'' regions.


History


Discovery of discrete inherited units

The existence of discrete inheritable units was first suggested by
Gregor Mendel Gregor Johann Mendel Order of Saint Augustine, OSA (; ; ; 20 July 1822 – 6 January 1884) was an Austrian Empire, Austrian biologist, meteorologist, mathematician, Augustinians, Augustinian friar and abbot of St Thomas's Abbey, Brno, St. Thom ...
(1822–1884). From 1857 to 1864, in
Brno Brno ( , ; ) is a Statutory city (Czech Republic), city in the South Moravian Region of the Czech Republic. Located at the confluence of the Svitava (river), Svitava and Svratka (river), Svratka rivers, Brno has about 403,000 inhabitants, making ...
,
Austrian Empire The Austrian Empire, officially known as the Empire of Austria, was a Multinational state, multinational European Great Powers, great power from 1804 to 1867, created by proclamation out of the Habsburg monarchy, realms of the Habsburgs. Duri ...
(today's Czech Republic), he studied inheritance patterns in 8000 common edible pea plants, tracking distinct traits from parent to offspring. He described these mathematically as 2n combinations where n is the number of differing characteristics in the original peas. Although he did not use the term ''gene'', he explained his results in terms of discrete inherited units that give rise to observable physical characteristics. This description prefigured
Wilhelm Johannsen Wilhelm Johannsen (3 February 1857 – 11 November 1927) was a Danish pharmacist, botanist, plant physiologist, and geneticist. He is best known for coining the terms gene, phenotype and genotype, and for his 1903 "pure line" experiments in ...
's distinction between
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
(the genetic material of an organism) and
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
(the observable traits of that organism). Mendel was also the first to demonstrate
independent assortment Mendelian inheritance (also known as Mendelism) is a type of biological inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, and later popularized ...
, the distinction between dominant and
recessive In genetics, dominance is the phenomenon of one variant (allele) of a gene on a chromosome masking or overriding the effect of a different variant of the same gene on the other copy of the chromosome. The first variant is termed dominant and ...
traits, the distinction between a
heterozygote Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
and
homozygote Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
, and the phenomenon of discontinuous inheritance. Prior to Mendel's work, the dominant theory of heredity was one of
blending inheritance Blending inheritance is an obsolete theory in biology from the 19th century. The theory is that the progeny inherits any characteristic as the average of the parents' values of that characteristic. As an example of this, a crossing of a red flo ...
, which suggested that each parent contributed fluids to the fertilization process and that the traits of the parents blended and mixed to produce the offspring.
Charles Darwin Charles Robert Darwin ( ; 12 February 1809 – 19 April 1882) was an English Natural history#Before 1900, naturalist, geologist, and biologist, widely known for his contributions to evolutionary biology. His proposition that all speci ...
developed a theory of inheritance he termed
pangenesis Pangenesis was Charles Darwin's hypothetical mechanism for heredity, in which he proposed that each part of the body continually emitted its own type of small organic particles called gemmules that aggregated in the gonads, contributing heritabl ...
, from
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
pan ("all, whole") and genesis ("birth") / genos ("origin"). Darwin used the term '' gemmule'' to describe hypothetical particles that would mix during reproduction. Mendel's work went largely unnoticed after its first publication in 1866, but was rediscovered in the late 19th century by
Hugo de Vries Hugo Marie de Vries (; 16 February 1848 – 21 May 1935) was a Dutch botanist and one of the first geneticists. He is known chiefly for suggesting the concept of genes, rediscovering the laws of heredity in the 1890s while apparently unaware of ...
,
Carl Correns Carl Erich Correns (19 September 1864 – 14 February 1933) was a German botanist and geneticist notable primarily for his independent discovery of the principles of heredity, which he achieved simultaneously but independently of the botanist ...
, and
Erich von Tschermak Erich Tschermak, Edler von Seysenegg (15 November 1871 – 11 October 1962) was an Austrian agronomist who developed several new disease-resistant crops, including wheat-rye and oat hybrids. He was a son of the Moravia-born mineralogist Gusta ...
, who (claimed to have) reached similar conclusions in their own research. Specifically, in 1889, Hugo de Vries published his book ''Intracellular Pangenesis'', Translated in 1908 from German to English by Open Court Publishing Co., Chicago, 1910 in which he postulated that different characters have individual hereditary carriers and that inheritance of specific traits in organisms comes in particles. De Vries called these units "pangenes" (''Pangens'' in German), after Darwin's 1868 pangenesis theory. Twenty years later, in 1909,
Wilhelm Johannsen Wilhelm Johannsen (3 February 1857 – 11 November 1927) was a Danish pharmacist, botanist, plant physiologist, and geneticist. He is best known for coining the terms gene, phenotype and genotype, and for his 1903 "pure line" experiments in ...
introduced the term "gene" (inspired by the
ancient Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
: γόνος, ''gonos'', meaning offspring and procreation) From p. 124: ''"Dieses "etwas" in den Gameten bezw. in der Zygote, ... – kurz, was wir eben Gene nennen wollen – bedingt sind."'' (This "something" in the gametes or in the zygote, which has crucial importance for the character of the organism, is usually called by the quite ambiguous term ''Anlagen'' rimordium, from the German word ''Anlage'' for "plan, arrangement; rough sketch" Many other terms have been suggested, mostly unfortunately in closer connection with certain hypothetical opinions. The word "pangene", which was introduced by Darwin, is perhaps used most frequently in place of ''Anlagen''. However, the word "pangene" was not well chosen, as it is a compound word containing the roots ''pan'' (the neuter form of Πας all, every) and ''gen'' (from γί-γ(ε)ν-ομαι, to become). Only the meaning of this latter .e., ''gen''comes into consideration here; just the basic idea – amely,that a trait in the developing organism can be determined or is influenced by "something" in the gametes – should find expression. No hypothesis about the nature of this "something" should be postulated or supported by it. For that reason it seems simplest to use in isolation the last syllable ''gen'' from Darwin's well-known word, which alone is of interest to us, in order to replace, with it, the poor, ambiguous word ''Anlage''. Thus we will say simply "gene" and "genes" for "pangene" and "pangenes". The word gene is completely free of any hypothesis; it expresses only the established fact that in any case many traits of the organism are determined by specific, separable, and thus independent "conditions", "foundations", "plans" – in short, precisely what we want to call genes.) and, in 1906,
William Bateson William Bateson (8 August 1861 – 8 February 1926) was an English biologist who was the first person to use the term genetics to describe the study of heredity, and the chief populariser of the ideas of Gregor Mendel following their rediscover ...
, that of "
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
" while
Eduard Strasburger Eduard Adolf Strasburger (1 February 1844 – 18 May 1912) was a Polish- German professor and one of the most famous botanists of the 19th century. He discovered mitosis in plants. Life Eduard Strasburger was born in Warsaw, Congress Poland, t ...
, among others, still used the term "pangene" for the fundamental physical and functional unit of heredity.


Discovery of DNA

Advances in understanding genes and inheritance continued throughout the 20th century.
Deoxyribonucleic acid Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of a ...
(DNA) was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s. Reprint: The structure of DNA was studied by
Rosalind Franklin Rosalind Elsie Franklin (25 July 192016 April 1958) was a British chemist and X-ray crystallographer. Her work was central to the understanding of the molecular structures of DNA (deoxyribonucleic acid), RNA (ribonucleic acid), viruses, coal ...
and
Maurice Wilkins Maurice Hugh Frederick Wilkins (15 December 1916 – 5 October 2004) was a New Zealand-born British biophysicist and Nobel laureate whose research spanned multiple areas of physics and biophysics, contributing to the scientific understanding ...
using
X-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
, which led
James D. Watson James Dewey Watson (born April 6, 1928) is an American molecular biologist, geneticist, and zoologist. In 1953, he co-authored with Francis Crick the academic paper in ''Nature'' proposing the double helix structure of the DNA molecule. Wats ...
and
Francis Crick Francis Harry Compton Crick (8 June 1916 – 28 July 2004) was an English molecular biologist, biophysicist, and neuroscientist. He, James Watson, Rosalind Franklin, and Maurice Wilkins played crucial roles in deciphering the Nucleic acid doub ...
to publish a model of the double-stranded DNA molecule whose paired
nucleotide base Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
s indicated a compelling hypothesis for the mechanism of genetic replication. In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities arranged like beads on a string. The experiments of Benzer using
mutant In biology, and especially in genetics, a mutant is an organism or a new genetic character arising or resulting from an instance of mutation, which is generally an alteration of the DNA sequence of the genome or chromosome of an organism. It i ...
s defective in the rII region of bacteriophage T4 (1955–1959) showed that individual genes have a simple linear structure and are likely to be equivalent to a linear section of DNA. Collectively, this body of research established the
central dogma of molecular biology The central dogma of molecular biology deals with the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by Francis Cr ...
, which states that
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s are translated from
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, which is transcribed from
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
. This dogma has since been shown to have exceptions, such as
reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
in
retrovirus A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase e ...
es. The modern study of
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
at the level of DNA is known as
molecular genetics Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the st ...
. In 1972, Walter Fiers and his team were the first to determine the sequence of a gene: that of
bacteriophage MS2 Bacteriophage MS2 (''Emesvirus zinderi''), commonly called MS2, is an icosahedral, positive-sense single-stranded RNA virus that infects the bacterium ''Escherichia coli'' and other members of the Enterobacteriaceae. MS2 is a member of a family ...
coat protein. The subsequent development of chain-termination
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
in 1977 by
Frederick Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was a British biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other prote ...
improved the efficiency of sequencing and turned it into a routine laboratory tool. An automated version of the Sanger method was used in early phases of the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
.


Modern synthesis and its successors

The theories developed in the early 20th century to integrate
Mendelian genetics Mendelian inheritance (also known as Mendelism) is a type of biological inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, and later popularized ...
with
Darwinian evolution ''Darwinism'' is a term used to describe a theory of biological evolution developed by the English naturalist Charles Darwin (1809–1882) and others. The theory states that all species of organisms arise and develop through the natural sele ...
are called the modern synthesis, a term introduced by
Julian Huxley Sir Julian Sorell Huxley (22 June 1887 – 14 February 1975) was an English evolutionary biologist, eugenicist and Internationalism (politics), internationalist. He was a proponent of natural selection, and a leading figure in the mid-twentiet ...
. This view of evolution was emphasized by George C. Williams' gene-centric view of evolution. He proposed that the Mendelian gene is a
unit Unit may refer to: General measurement * Unit of measurement, a definite magnitude of a physical quantity, defined and adopted by convention or by law **International System of Units (SI), modern form of the metric system **English units, histo ...
of
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
with the definition: "that which segregates and recombines with appreciable frequency." Related ideas emphasizing the centrality of Mendelian genes and the importance of natural selection in evolution were popularized by
Richard Dawkins Richard Dawkins (born 26 March 1941) is a British evolutionary biology, evolutionary biologist, zoologist, science communicator and author. He is an Oxford fellow, emeritus fellow of New College, Oxford, and was Simonyi Professor for the Publ ...
. The development of the neutral theory of evolution in the late 1960s led to the recognition that random genetic drift is a major player in evolution and that neutral theory should be the null hypothesis of molecular evolution. This led to the construction of
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
s and the development of the
molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleot ...
, which is the basis of all dating techniques using DNA sequences. These techniques are not confined to molecular gene sequences but can be used on all DNA segments in the genome.


Molecular basis


DNA

The vast majority of organisms encode their genes in long strands of
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
(deoxyribonucleic acid). DNA consists of a
chain A chain is a serial assembly of connected pieces, called links, typically made of metal, with an overall character similar to that of a rope in that it is flexible and curved in compression but linear, rigid, and load-bearing in tension. A ...
made from four types of
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
subunits, each composed of: a five-carbon sugar ( 2-deoxyribose), a
phosphate Phosphates are the naturally occurring form of the element phosphorus. In chemistry, a phosphate is an anion, salt, functional group or ester derived from a phosphoric acid. It most commonly means orthophosphate, a derivative of orthop ...
group, and one of the four bases
adenine Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
,
cytosine Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
,
guanine Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
, and
thymine Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
. Two chains of DNA twist around each other to form a DNA
double helix In molecular biology, the term double helix refers to the structure formed by base pair, double-stranded molecules of nucleic acids such as DNA. The double Helix, helical structure of a nucleic acid complex arises as a consequence of its Nuclei ...
with the phosphate–sugar backbone spiralling around the outside, and the bases pointing inward with adenine
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
ing to thymine and guanine to cytosine. The specificity of base pairing occurs because adenine and thymine align to form two
hydrogen bond In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s, whereas cytosine and guanine form three hydrogen bonds. The two strands in a double helix must, therefore, be complementary, with their sequence of bases matching such that the adenines of one strand are paired with the thymines of the other strand, and so on. Due to the chemical composition of the
pentose In chemistry, a pentose is a monosaccharide (simple sugar) with five carbon atoms. The chemical formula of many pentoses is , and their molecular weight is 150.13 g/mol.polymer A polymer () is a chemical substance, substance or material that consists of very large molecules, or macromolecules, that are constituted by many repeat unit, repeating subunits derived from one or more species of monomers. Due to their br ...
contains an exposed
hydroxyl In chemistry, a hydroxy or hydroxyl group is a functional group with the chemical formula and composed of one oxygen atom covalently bonded to one hydrogen atom. In organic chemistry, alcohols and carboxylic acids contain one or more hydroxy ...
group on the
deoxyribose Deoxyribose, or more precisely 2-deoxyribose, is a monosaccharide with idealized formula H−(C=O)−(CH2)−(CHOH)3−H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of a hydroxy group. D ...
; this is known as the 3' end of the molecule. The other end contains an exposed
phosphate Phosphates are the naturally occurring form of the element phosphorus. In chemistry, a phosphate is an anion, salt, functional group or ester derived from a phosphoric acid. It most commonly means orthophosphate, a derivative of orthop ...
group; this is the 5' end. The two strands of a double-helix run in opposite directions. Nucleic acid synthesis, including
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
and transcription occurs in the 5'→3' direction, because new nucleotides are added via a
dehydration reaction In chemistry, a dehydration reaction is a chemical reaction that involves the loss of an H2O from the reacting molecule(s) or ion(s). This reaction results in the release of the H2O as water. When the reaction involves the coupling of two molecu ...
that uses the exposed 3' hydroxyl as a
nucleophile In chemistry, a nucleophile is a chemical species that forms bonds by donating an electron pair. All molecules and ions with a free pair of electrons or at least one pi bond can act as nucleophiles. Because nucleophiles donate electrons, they are ...
. The expression of genes encoded in DNA begins by transcribing the gene into
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar
ribose Ribose is a simple sugar and carbohydrate with molecular formula C5H10O5 and the linear-form composition H−(C=O)−(CHOH)4−H. The naturally occurring form, , is a component of the ribonucleotides from which RNA is built, and so this comp ...
rather than
deoxyribose Deoxyribose, or more precisely 2-deoxyribose, is a monosaccharide with idealized formula H−(C=O)−(CH2)−(CHOH)3−H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of a hydroxy group. D ...
. RNA also contains the base
uracil Uracil () (nucleoside#List of nucleosides and corresponding nucleobases, symbol U or Ura) is one of the four nucleotide bases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via ...
in place of
thymine Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
sequences called
codon Genetic code is a set of rules used by living cells to translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links prote ...
s, which serve as the "words" in the genetic "language". The
genetic code Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
specifies the correspondence during
protein translation In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. T ...
between codons and
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
s. The genetic code is nearly the same for all known organisms.


Chromosomes

The total complement of genes in an organism or cell is known as its
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, which may be stored on one or more
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s. A chromosome consists of a single, very long DNA helix on which thousands of genes are encoded. The region of the chromosome at which a particular gene is located is called its locus. Each locus contains one
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
of a gene; however, members of a population may have different alleles at the locus, each with a slightly different gene sequence. The majority of
eukaryotic The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
genes are stored on a set of large, linear chromosomes. The chromosomes are packed within the
nucleus Nucleus (: nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom *Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucleu ...
in complex with storage proteins called
histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaeal phyla. They act as spools around which DNA winds to create structural units called nucleosomes ...
s to form a unit called a
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone, histone proteins and resembles thread wrapped around a bobbin, spool. The nucleosome ...
. DNA packaged and condensed in this way is called
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
. The manner in which DNA is stored on the histones, as well as chemical modifications of the histone itself, regulate whether a particular region of DNA is accessible for
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
. In addition to genes, eukaryotic chromosomes contain sequences involved in ensuring that the DNA is copied without degradation of end regions and sorted into daughter cells during cell division: replication origins,
telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes (see #Sequences, Sequences). Telomeres are a widespread genetic feature most commonly found in eukaryotes. In ...
s, and the
centromere The centromere links a pair of sister chromatids together during cell division. This constricted region of chromosome connects the sister chromatids, creating a short arm (p) and a long arm (q) on the chromatids. During mitosis, spindle fiber ...
. Replication origins are the sequence regions where
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
is initiated to make two copies of the chromosome. Telomeres are long stretches of repetitive sequences that cap the ends of the linear chromosomes and prevent degradation of coding and regulatory regions during
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
. The length of the telomeres decreases each time the genome is replicated and has been implicated in the
aging Ageing (or aging in American English) is the process of becoming Old age, older until death. The term refers mainly to humans, many other animals, and fungi; whereas for example, bacteria, perennial plants and some simple animals are potentiall ...
process. The centromere is required for binding spindle fibres to separate sister chromatids into daughter cells during
cell division Cell division is the process by which a parent cell (biology), cell divides into two daughter cells. Cell division usually occurs as part of a larger cell cycle in which the cell grows and replicates its chromosome(s) before dividing. In eukar ...
.
Prokaryote A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s (
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
) typically store their genomes on a single, large, circular chromosome. Similarly, some eukaryotic
organelle In cell biology, an organelle is a specialized subunit, usually within a cell (biology), cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as Organ (anatomy), organs are to th ...
s contain a remnant circular chromosome with a small number of genes. Prokaryotes sometimes supplement their chromosome with additional small circles of DNA called
plasmid A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria and ...
s, which usually encode only a few genes and are transferable between individuals. For example, the genes for
antibiotic resistance Antimicrobial resistance (AMR or AR) occurs when microbes evolve mechanisms that protect them from antimicrobials, which are drugs used to treat infections. This resistance affects all classes of microbes, including bacteria (antibiotic resis ...
are usually encoded on bacterial plasmids and can be passed between individual cells, even those of different species, via
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). HGT is an important factor in the e ...
. Whereas the chromosomes of prokaryotes are relatively gene-dense, those of eukaryotes often contain regions of DNA that serve no obvious function. Simple single-celled eukaryotes have relatively small amounts of such DNA, whereas the genomes of complex
multicellular organism A multicellular organism is an organism that consists of more than one cell (biology), cell, unlike unicellular organisms. All species of animals, Embryophyte, land plants and most fungi are multicellular, as are many algae, whereas a few organism ...
s, including humans, contain an absolute majority of DNA without an identified function. This DNA has often been referred to as "
junk DNA Junk DNA (non-functional DNA) is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organ ...
". However, more recent analyses suggest that, although protein-coding DNA makes up barely 2% of the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
, about 80% of the bases in the genome may be expressed, so the term "junk DNA" may be a misnomer.


Structure and function


Structure

The structure of a protein-coding gene consists of many elements of which the actual protein coding sequence is often only a small part. These include introns and untranslated regions of the mature mRNA. Noncoding genes can also contain introns that are removed during processing to produce the mature functional RNA. All genes are associated with
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vir ...
s that are required for their expression. First, genes require a promoter sequence. The promoter is recognized and bound by
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s that recruit and help
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template. Using the e ...
bind to the region to initiate transcription. The recognition typically occurs as a
consensus sequence In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
like the
TATA box In molecular biology, the TATA box (also called the Goldberg–Hogness box) is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes. The bacterial homolog of the TATA box is called the Pribnow box which has a ...
. A gene can have more than one promoter, resulting in messenger RNAs (
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
) that differ in how far they extend in the 5' end. Highly transcribed genes have "strong" promoter sequences that form strong associations with transcription factors, thereby initiating transcription at a high rate. Others genes have "weak" promoters that form weak associations with transcription factors and initiate transcription less frequently.
Eukaryotic The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
promoter regions are much more complex and difficult to identify than
prokaryotic A prokaryote (; less commonly spelled procaryote) is a single-celled organism whose cell lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Greek (), meaning 'before', and (), meaning 'nut' ...
promoters. Additionally, genes can have regulatory regions many kilobases upstream or downstream of the gene that alter expression. These act by binding to transcription factors which then cause the DNA to loop so that the regulatory sequence (and bound transcription factor) become close to the RNA polymerase binding site. For example,
enhancers In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
increase transcription by binding an activator protein which then helps to recruit the RNA polymerase to the promoter; conversely silencers bind
repressor In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the ...
proteins and make the DNA less available for RNA polymerase. The mature messenger RNA produced from protein-coding genes contains
untranslated region In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the Directionality (molecular biology), 5' side, it is called the Five prime ...
s at both ends which contain binding sites for
ribosomes Ribosomes () are macromolecular machines, found within all cells, that perform biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA molecules to fo ...
,
RNA-binding protein RNA-binding proteins (often abbreviated as RBPs) are proteins that bind to the double or single stranded RNA in cell (biology), cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA reco ...
s,
miRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcri ...
, as well as terminator, and
start Start can refer to multiple topics: * Takeoff, the phase of flight where an aircraft transitions from moving along the ground to flying through the air * Starting lineup in sports * Track and field#Starts use in race, Starts use in sport race * S ...
and stop codons. In addition, most eukaryotic
open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
s contain untranslated
intron An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
s, which are removed and
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
s, which are connected together in a process known as
RNA splicing RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcription (biology), transcript is transformed into a mature messenger RNA (Messenger RNA, mRNA). It works by removing all the introns (non-cod ...
. Finally, the ends of gene transcripts are defined by cleavage and polyadenylation (CPA) sites, where newly produced pre-mRNA gets cleaved and a string of ~200 adenosine monophosphates is added at the 3' end. The poly(A) tail protects mature mRNA from degradation and has other functions, affecting translation, localization, and transport of the transcript from the nucleus. Splicing, followed by CPA, generate the final
mature mRNA Mature messenger RNA, often abbreviated as mature mRNA is a eukaryotic RNA transcript that has been spliced and processed and is ready for translation in the course of protein synthesis. Unlike the eukaryotic RNA immediately after transcription ...
, which encodes the protein or RNA product. Many noncoding genes in eukaryotes have different transcription termination mechanisms and they do not have poly(A) tails. Many prokaryotic genes are organized into
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
s, with multiple protein-coding sequences that are transcribed as a unit. The genes in an
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
are transcribed as a continuous
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
, referred to as a polycistronic mRNA. The term
cistron A cistron is a region of DNA that is conceptually equivalent to some definitions of a gene, such that the terms are synonymous from certain viewpoints, especially with regard to the molecular gene as contrasted with the Mendelian gene. The quest ...
in this context is equivalent to gene. The transcription of an operon's mRNA is often controlled by a
repressor In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the ...
that can occur in an active or inactive state depending on the presence of specific metabolites. When active, the repressor binds to a DNA sequence at the beginning of the operon, called the operator region, and represses transcription of the
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
; when the repressor is inactive transcription of the operon can occur (see e.g. Lac operon). The products of operon genes typically have related functions and are involved in the same regulatory network.


Complexity

Though many genes have simple structures, as with much of biology, others can be quite complex or represent unusual edge-cases. Eukaryotic genes often have introns that are much larger than their exons, and those introns can even have other genes nested inside them.. Associated enhancers may be many kilobase away, or even on entirely different chromosomes operating via physical contact between two chromosomes. A single gene can encode multiple different functional products by
alternative splicing Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
, and conversely a gene may be split across chromosomes but those transcripts are concatenated back together into a functional sequence by
trans-splicing ''Trans''-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archa ...
. It is also possible for overlapping genes to share some of their DNA sequence, either on opposite strands or the same strand (in a different reading frame, or even the same reading frame).


Gene expression

In all organisms, two steps are required to read the information encoded in a gene's DNA and produce the protein it specifies. First, the gene's DNA is '' transcribed'' to messenger RNA (
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
). Second, that mRNA is '' translated'' to protein. RNA-coding genes must still go through the first step, but are not translated into protein. The process of producing a biologically functional molecule of either RNA or protein is called
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
, and the resulting molecule is called a
gene product A gene product is the biochemical material, either RNA or protein, resulting from the expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be corre ...
.


Genetic code

The nucleotide sequence of a gene's DNA specifies the amino acid sequence of a protein through the
genetic code Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
. Sets of three nucleotides, known as
codon Genetic code is a set of rules used by living cells to translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links prote ...
s, each correspond to a specific amino acid. The principle that three sequential bases of DNA code for each amino acid was demonstrated in 1961 using frameshift mutations in the rIIB gene of bacteriophage T4 (see Crick, Brenner et al. experiment). Additionally, a "
start codon The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a ''N''-formylmethionine (fMet) in bacteria, mitochondria and plastids. ...
", and three "
stop codon In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
s" indicate the beginning and end of the protein coding region. There are 64 possible codons (four possible nucleotides at each of three positions, hence 43 possible codons) and only 20 standard amino acids; hence the code is redundant and multiple codons can specify the same amino acid. The correspondence between codons and amino acids is nearly universal among all known living organisms.


Transcription

Transcription produces a single-stranded
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
molecule known as
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
, whose nucleotide sequence is complementary to the DNA from which it was transcribed. The mRNA acts as an intermediate between the DNA gene and its final protein product. The gene's DNA is used as a template to generate a complementary mRNA. The mRNA matches the sequence of the gene's DNA
coding strand When referring to DNA transcription, the coding strand (or informational strand) is the DNA strand whose base sequence is identical to the base sequence of the RNA transcript produced (although with thymine replaced by uracil). It is this stran ...
because it is synthesised as the complement of the
template strand Transcription is the process of copying a segment of DNA into RNA for the purpose of gene expression. Some segments of DNA are transcribed into RNA molecules that can encode proteins, called messenger RNA (mRNA). Other segments of DNA are transc ...
. Transcription is performed by an
enzyme An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
called an
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template. Using the e ...
, which reads the template strand in the 3' to 5' direction and synthesizes the RNA from 5' to 3'. To initiate transcription, the polymerase first recognizes and binds a promoter region of the gene. Thus, a major mechanism of
gene regulation Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
is the blocking or sequestering the promoter region, either by tight binding by
repressor In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the ...
molecules that physically block the polymerase or by organizing the DNA so that the promoter region is not accessible. In
prokaryote A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s, transcription occurs in the
cytoplasm The cytoplasm describes all the material within a eukaryotic or prokaryotic cell, enclosed by the cell membrane, including the organelles and excluding the nucleus in eukaryotic cells. The material inside the nucleus of a eukaryotic cell a ...
; for very long transcripts, translation may begin at the 5' end of the RNA while the 3' end is still being transcribed. In
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, transcription occurs in the nucleus, where the cell's DNA is stored. The RNA molecule produced by the polymerase is known as the
primary transcript A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNA ...
and undergoes
post-transcriptional modification Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, f ...
s before being exported to the cytoplasm for translation. One of the modifications performed is the splicing of
intron An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
s which are sequences in the transcribed region that do not encode a protein.
Alternative splicing Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
mechanisms can result in mature transcripts from the same gene having different sequences and thus coding for different proteins. This is a major form of regulation in eukaryotic cells and also occurs in some prokaryotes.


Translation

Translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
is the process by which a
mature mRNA Mature messenger RNA, often abbreviated as mature mRNA is a eukaryotic RNA transcript that has been spliced and processed and is ready for translation in the course of protein synthesis. Unlike the eukaryotic RNA immediately after transcription ...
molecule is used as a template for synthesizing a new
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
. Translation is carried out by
ribosome Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
s, large complexes of RNA and protein responsible for carrying out the chemical reactions to add new
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
s to a growing
polypeptide chain Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty ami ...
by the formation of
peptide bond In organic chemistry, a peptide bond is an amide type of covalent chemical bond linking two consecutive alpha-amino acids from C1 (carbon number one) of one alpha-amino acid and N2 (nitrogen number two) of another, along a peptide or protein cha ...
s. The genetic code is read three nucleotides at a time, in units called
codon Genetic code is a set of rules used by living cells to translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links prote ...
s, via interactions with specialized RNA molecules called
transfer RNA Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
(tRNA). Each tRNA has three unpaired bases known as the
anticodon Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
that are complementary to the codon it reads on the mRNA. The tRNA is also
covalent A covalent bond is a chemical bond that involves the sharing of electrons to form electron pairs between atoms. These electron pairs are known as shared pairs or bonding pairs. The stable balance of attractive and repulsive forces between atom ...
ly attached to the
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
specified by the complementary codon. When the tRNA binds to its complementary codon in an mRNA strand, the ribosome attaches its amino acid cargo to the new polypeptide chain, which is synthesized from
amino terminus The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the amin ...
to carboxyl terminus. During and after synthesis, most new proteins must fold to their active three-dimensional structure before they can carry out their cellular functions.


Regulation

Genes are regulated so that they are expressed only when the product is needed, since expression draws on limited resources. A cell regulates its gene expression depending on its external environment (e.g. available nutrients,
temperature Temperature is a physical quantity that quantitatively expresses the attribute of hotness or coldness. Temperature is measurement, measured with a thermometer. It reflects the average kinetic energy of the vibrating and colliding atoms making ...
and other stresses), its internal environment (e.g. cell division cycle,
metabolism Metabolism (, from ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the co ...
, infection status), and its specific role if in a
multicellular A multicellular organism is an organism that consists of more than one cell (biology), cell, unlike unicellular organisms. All species of animals, Embryophyte, land plants and most fungi are multicellular, as are many algae, whereas a few organism ...
organism. Gene expression can be regulated at any step: from transcriptional initiation, to
RNA processing Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, fu ...
, to
post-translational modification In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
of the protein. The regulation of
lactose Lactose is a disaccharide composed of galactose and glucose and has the molecular formula C12H22O11. Lactose makes up around 2–8% of milk (by mass). The name comes from (Genitive case, gen. ), the Latin word for milk, plus the suffix ''-o ...
metabolism genes in ''
E. coli ''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escherichia'' that is commonly foun ...
'' ( ''lac'' operon) was the first such mechanism to be described in 1961.


RNA genes

A typical protein-coding gene is first copied into
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
as an intermediate in the manufacture of the final protein product. In other cases, the RNA molecules are the actual functional products, as in the synthesis of
ribosomal RNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
and
transfer RNA Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
. Some RNAs known as
ribozyme Ribozymes (ribonucleic acid enzymes) are RNA molecules that have the ability to Catalysis, catalyze specific biochemical reactions, including RNA splicing in gene expression, similar to the action of protein enzymes. The 1982 discovery of ribozy ...
s are capable of enzymatic function, while others such as
microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s and
riboswitch In molecular biology, a riboswitch is a regulatory segment of a messenger RNA molecule that binds a small molecule, resulting in a change in Translation (biology), production of the proteins encoded by the mRNA. Thus, an mRNA that contains a ribo ...
es have regulatory roles. The
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences from which such RNAs are transcribed are known as non-coding RNA genes. Some
virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es store their entire genomes in the form of
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, and contain no DNA at all. Because they use RNA to store genes, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. On the other hand, RNA
retrovirus A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase e ...
es, such as
HIV The human immunodeficiency viruses (HIV) are two species of '' Lentivirus'' (a subgroup of retrovirus) that infect humans. Over time, they cause acquired immunodeficiency syndrome (AIDS), a condition in which progressive failure of the im ...
, require the
reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
of their
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
from RNA into DNA before their proteins can be synthesized.


Inheritance

Organisms inherit their genes from their parents. Asexual organisms simply inherit a complete copy of their parent's genome. Sexual organisms have two copies of each chromosome because they inherit one complete set from each parent.


Mendelian inheritance

According to
Mendelian inheritance Mendelian inheritance (also known as Mendelism) is a type of biological inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, and later popularize ...
, variations in an organism's
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
(observable physical and behavioral characteristics) are due in part to variations in its
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
(particular set of genes). Each gene specifies a particular trait with a different sequence of a gene (
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s) giving rise to different phenotypes. Most eukaryotic organisms (such as the pea plants Mendel worked on) have two alleles for each trait, one inherited from each parent. Alleles at a locus may be dominant or
recessive In genetics, dominance is the phenomenon of one variant (allele) of a gene on a chromosome masking or overriding the effect of a different variant of the same gene on the other copy of the chromosome. The first variant is termed dominant and ...
; dominant alleles give rise to their corresponding phenotypes when paired with any other allele for the same trait, whereas recessive alleles give rise to their corresponding phenotype only when paired with another copy of the same allele. If you know the genotypes of the organisms, you can determine which alleles are dominant and which are recessive. For example, if the allele specifying tall stems in pea plants is dominant over the allele specifying short stems, then pea plants that inherit one tall allele from one parent and one short allele from the other parent will also have tall stems. Mendel's work demonstrated that alleles assort independently in the production of
gamete A gamete ( ) is a Ploidy#Haploid and monoploid, haploid cell that fuses with another haploid cell during fertilization in organisms that Sexual reproduction, reproduce sexually. Gametes are an organism's reproductive cells, also referred to as s ...
s, or
germ cell A germ cell is any cell that gives rise to the gametes of an organism that reproduces sexually. In many animals, the germ cells originate in the primitive streak and migrate via the gut of an embryo to the developing gonads. There, they unde ...
s, ensuring variation in the next generation. Although Mendelian inheritance remains a good model for many traits determined by single genes (including a number of well-known
genetic disorders A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosome abnormality. Although polygenic disorders are ...
) it does not include the physical processes of DNA replication and cell division.


DNA replication and cell division

The growth, development, and reproduction of organisms relies on
cell division Cell division is the process by which a parent cell (biology), cell divides into two daughter cells. Cell division usually occurs as part of a larger cell cycle in which the cell grows and replicates its chromosome(s) before dividing. In eukar ...
; the process by which a single cell divides into two usually identical
daughter cell Cell division is the process by which a parent cell divides into two daughter cells. Cell division usually occurs as part of a larger cell cycle in which the cell grows and replicates its chromosome(s) before dividing. In eukaryotes, there ar ...
s. This requires first making a duplicate copy of every gene in the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
in a process called
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
. The copies are made by specialized
enzyme An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s known as
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
s, which "read" one strand of the double-helical DNA, known as the template strand, and synthesize a new complementary strand. Because the DNA double helix is held together by
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
ing, the sequence of one strand completely specifies the sequence of its complement; hence only one strand needs to be read by the enzyme to produce a faithful copy. The process of DNA replication is semiconservative; that is, the copy of the genome inherited by each daughter cell contains one original and one newly synthesized strand of DNA. The rate of DNA replication in living cells was first measured as the rate of phage T4 DNA elongation in phage-infected ''E. coli'' and found to be impressively rapid. During the period of exponential DNA increase at 37 °C, the rate of elongation was 749 nucleotides per second. After DNA replication, the cell must physically separate the two genome copies and divide into two distinct membrane-bound cells. In
prokaryote A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s (
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
) this usually occurs via a relatively simple process called
binary fission Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical o ...
, in which each circular genome attaches to the
cell membrane The cell membrane (also known as the plasma membrane or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of a cell from the outside environment (the extr ...
and is separated into the daughter cells as the membrane invaginates to split the
cytoplasm The cytoplasm describes all the material within a eukaryotic or prokaryotic cell, enclosed by the cell membrane, including the organelles and excluding the nucleus in eukaryotic cells. The material inside the nucleus of a eukaryotic cell a ...
into two membrane-bound portions. Binary fission is extremely fast compared to the rates of cell division in
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s. Eukaryotic cell division is a more complex process known as the
cell cycle The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell (biology), cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA re ...
; DNA replication occurs during a phase of this cycle known as
S phase S phase (Synthesis phase) is the phase of the cell cycle in which DNA is replicated, occurring between G1 phase and G2 phase. Since accurate duplication of the genome is critical to successful cell division, the processes that occur during S ...
, whereas the process of segregating
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s and splitting the
cytoplasm The cytoplasm describes all the material within a eukaryotic or prokaryotic cell, enclosed by the cell membrane, including the organelles and excluding the nucleus in eukaryotic cells. The material inside the nucleus of a eukaryotic cell a ...
occurs during
M phase The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA replication) and ...
.


Molecular inheritance

The duplication and transmission of genetic material from one generation of cells to the next is the basis for molecular inheritance and the link between the classical and molecular pictures of genes. Organisms inherit the characteristics of their parents because the cells of the offspring contain copies of the genes in their parents' cells. In asexually reproducing organisms, the offspring will be a genetic copy or clone of the parent organism. In
sexually reproducing Sexual reproduction is a type of reproduction that involves a complex life cycle in which a gamete (haploid reproductive cells, such as a sperm or egg cell) with a single set of chromosomes combines with another gamete to produce a zygote that d ...
organisms, a specialized form of cell division called
meiosis Meiosis () is a special type of cell division of germ cells in sexually-reproducing organisms that produces the gametes, the sperm or egg cells. It involves two rounds of division that ultimately result in four cells, each with only one c ...
produces cells called
gamete A gamete ( ) is a Ploidy#Haploid and monoploid, haploid cell that fuses with another haploid cell during fertilization in organisms that Sexual reproduction, reproduce sexually. Gametes are an organism's reproductive cells, also referred to as s ...
s or
germ cell A germ cell is any cell that gives rise to the gametes of an organism that reproduces sexually. In many animals, the germ cells originate in the primitive streak and migrate via the gut of an embryo to the developing gonads. There, they unde ...
s that are
haploid Ploidy () is the number of complete sets of chromosomes in a cell (biology), cell, and hence the number of possible alleles for Autosome, autosomal and Pseudoautosomal region, pseudoautosomal genes. Here ''sets of chromosomes'' refers to the num ...
, or contain only one copy of each gene. The gametes produced by females are called eggs or ova, and those produced by males are called
sperm Sperm (: sperm or sperms) is the male reproductive Cell (biology), cell, or gamete, in anisogamous forms of sexual reproduction (forms in which there is a larger, female reproductive cell and a smaller, male one). Animals produce motile sperm ...
. Two gametes fuse to form a
diploid Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Here ''sets of chromosomes'' refers to the number of maternal and paternal chromosome copies, ...
fertilized egg A zygote (; , ) is a eukaryotic cell formed by a fertilization event between two gametes. The zygote's genome is a combination of the DNA in each gamete, and contains all of the genetic information of a new individual organism. The sexual ...
, a single cell that has two sets of genes, with one copy of each gene from the mother and one from the father. During the process of meiotic cell division, an event called
genetic recombination Genetic recombination (also known as genetic reshuffling) is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryot ...
or ''crossing-over'' can sometimes occur, in which a length of DNA on one
chromatid A chromatid (Greek ''khrōmat-'' 'color' + ''-id'') is one half of a duplicated chromosome. Before replication, one chromosome is composed of one DNA molecule. In replication, the DNA molecule is copied, and the two molecules are known as chrom ...
is swapped with a length of DNA on the corresponding homologous non-sister chromatid. This can result in reassortment of otherwise linked alleles. The Mendelian principle of independent assortment asserts that each of a parent's two genes for each trait will sort independently into gametes; which allele an organism inherits for one trait is unrelated to which allele it inherits for another trait. This is in fact only true for genes that do not reside on the same chromosome or are located very far from one another on the same chromosome. The closer two genes lie on the same chromosome, the more closely they will be associated in gametes and the more often they will appear together (known as
genetic linkage Genetic linkage is the tendency of Nucleic acid sequence, DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two Genetic marker, genetic markers that are physically near ...
). Genes that are very close are essentially never separated because it is extremely unlikely that a crossover point will occur between them.


Genome

The
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
is the total genetic material of an organism and includes both the genes and non-coding sequences. Eukaryotic genes can be annotated using FINDER.


Number of genes

The
genome size Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms (trillionths or 10−12 of a gram, abbreviated pg) or less frequently in daltons, or as the tot ...
, and the number of genes it encodes varies widely between organisms. The smallest genomes occur in
virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es, and
viroid Viroids are small single-stranded, circular RNAs that are infectious pathogens. Unlike viruses, they have no protein coating. All known viroids are inhabitants of angiosperms (flowering plants), and most cause diseases, whose respective eco ...
s (which act as a single non-coding RNA gene). Conversely, plants can have extremely large genomes, with
rice Rice is a cereal grain and in its Domestication, domesticated form is the staple food of over half of the world's population, particularly in Asia and Africa. Rice is the seed of the grass species ''Oryza sativa'' (Asian rice)—or, much l ...
containing >46,000 protein-coding genes. The total number of protein-coding genes (the Earth's
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
) is estimated to be 5 million sequences. Although the number of base-pairs of DNA in the human genome has been known since the 1950s, the estimated number of genes has changed over time as definitions of genes, and methods of detecting them have been refined. Initial theoretical predictions of the number of human genes in the 1960s and 1970s were based on mutation load estimates and the numbers of mRNAs and these estimates tended to be about 30,000 protein-coding genes. During the 1990s there were guesstimates of up to 100,000 genes and early data on detection of mRNAs (
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has pro ...
s) suggested more than the traditional value of 30,000 genes that had been reported in the textbooks during the 1980s. The initial draft sequences of the human genome confirmed the earlier predictions of about 30,000 protein-coding genes however that estimate has fallen to about 19,000 with the ongoing GENCODE annotation project. The number of noncoding genes is not known with certainty but the latest estimates from Ensembl suggest 26,000 noncoding genes.


Essential genes

Essential genes are the set of genes thought to be critical for an organism's survival. This definition assumes the abundant availability of all relevant
nutrient A nutrient is a substance used by an organism to survive, grow and reproduce. The requirement for dietary nutrient intake applies to animals, plants, fungi and protists. Nutrients can be incorporated into cells for metabolic purposes or excret ...
s and the absence of environmental stress. Only a small portion of an organism's genes are essential. In bacteria, an estimated 250–400 genes are essential for ''
Escherichia coli ''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Escherichia'' that is commonly fo ...
'' and ''
Bacillus subtilis ''Bacillus subtilis'' (), known also as the hay bacillus or grass bacillus, is a gram-positive, catalase-positive bacterium, found in soil and the gastrointestinal tract of ruminants, humans and marine sponges. As a member of the genus ''Bacill ...
'', which is less than 10% of their genes. Half of these genes are
ortholog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speci ...
s in both organisms and are largely involved in
protein synthesis Protein biosynthesis, or protein synthesis, is a core biological process, occurring inside cells, balancing the loss of cellular proteins (via degradation or export) through the production of new proteins. Proteins perform a number of critica ...
. In the budding yeast ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have be ...
'' the number of essential genes is slightly higher, at 1000 genes (~20% of their genes). Although the number is more difficult to measure in higher eukaryotes, mice and humans are estimated to have around 2000 essential genes (~10% of their genes). The synthetic organism, '' Syn 3'', has a minimal genome of 473 essential genes and quasi-essential genes (necessary for fast growth), although 149 have unknown function. Essential genes include housekeeping genes (critical for basic cell functions) as well as genes that are expressed at different times in the organisms
development Development or developing may refer to: Arts *Development (music), the process by which thematic material is reshaped * Photographic development *Filmmaking, development phase, including finance and budgeting * Development hell, when a proje ...
or life cycle. Housekeeping genes are used as experimental controls when analysing gene expression, since they are constitutively expressed at a relatively constant level.


Genetic and genomic nomenclature

Gene nomenclature Gene nomenclature is the scientific naming of genes, the units of heredity in living organisms. It is also closely associated with protein nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international co ...
was established by the
HUGO Gene Nomenclature Committee The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
(HGNC), a committee of the
Human Genome Organisation The Human Genome Organisation (HUGO) is a non-profit organization founded in 1988. HUGO represents an international coordinating scientific body in response to initiatives such as the Human Genome Project. HUGO has four active committees, includi ...
, for each known human gene in the form of an approved gene name and
symbol A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
(short-form
abbreviation An abbreviation () is a shortened form of a word or phrase, by any method including shortening (linguistics), shortening, contraction (grammar), contraction, initialism (which includes acronym), or crasis. An abbreviation may be a shortened for ...
), which can be accessed through a database maintained by HGNC. Symbols are chosen to be unique, and each gene has only one symbol (although approved symbols sometimes change). Symbols are preferably kept consistent with other members of a
gene family A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on ...
and with homologs in other species, particularly the
mouse A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus'' ...
due to its role as a common
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
.


Genetic engineering

Genetic engineering is the modification of an organism's
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
through
biotechnology Biotechnology is a multidisciplinary field that involves the integration of natural sciences and Engineering Science, engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists ...
. Since the 1970s, a variety of techniques have been developed to specifically add, remove and edit genes in an organism. Recently developed
genome engineering Genome editing, or genome engineering, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of a living organism. Unlike early genetic engineering techniques that randomly insert ge ...
techniques use engineered
nuclease In biochemistry, a nuclease (also archaically known as nucleodepolymerase or polynucleotidase) is an enzyme capable of cleaving the phosphodiester bonds that link nucleotides together to form nucleic acids. Nucleases variously affect single and ...
enzyme An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s to create targeted
DNA repair DNA repair is a collection of processes by which a cell (biology), cell identifies and corrects damage to the DNA molecules that encode its genome. A weakened capacity for DNA repair is a risk factor for the development of cancer. DNA is cons ...
in a
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
to either disrupt or edit a gene when the break is repaired. The related term
synthetic biology Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms. It applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nat ...
is sometimes used to refer to extensive genetic engineering of an organism. Genetic engineering is now a routine research tool with
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
s. For example, genes are easily added to
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and lineages of
knockout mice A knockout mouse, or knock-out mouse, is a genetically modified mouse (''Mus musculus'') in which researchers have inactivated, or " knocked out", an existing gene by replacing it or disrupting it with an artificial piece of DNA. They are importan ...
with a specific gene's function disrupted are used to investigate that gene's function. Many organisms have been genetically modified for applications in
agriculture Agriculture encompasses crop and livestock production, aquaculture, and forestry for food and non-food products. Agriculture was a key factor in the rise of sedentary human civilization, whereby farming of domesticated species created ...
, industrial biotechnology, and
medicine Medicine is the science and Praxis (process), practice of caring for patients, managing the Medical diagnosis, diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, ...
. For multicellular organisms, typically the
embryo An embryo ( ) is the initial stage of development for a multicellular organism. In organisms that reproduce sexually, embryonic development is the part of the life cycle that begins just after fertilization of the female egg cell by the male sp ...
is engineered which grows into the adult
genetically modified organism A genetically modified organism (GMO) is any organism whose genetic material has been altered using genetic engineering techniques. The exact definition of a genetically modified organism and what constitutes genetic engineering varies, with ...
. However, the genomes of cells in an adult organism can be edited using
gene therapy Gene therapy is Health technology, medical technology that aims to produce a therapeutic effect through the manipulation of gene expression or through altering the biological properties of living cells. The first attempt at modifying human DNA ...
techniques to treat genetic diseases.


See also


References


Citations


Sources

; Main textbook * – A molecular biology textbook available free online through NCBI Bookshelf.
Glossary

Ch 1: Cells and genomes

1.1: The Universal Features of Cells on Earth

Ch 2: Cell Chemistry and Biosynthesis

2.1: The Chemical Components of a Cell

Ch 3: Proteins

Ch 4: DNA and Chromosomes

4.1: The Structure and Function of DNA

4.2: Chromosomal DNA and Its Packaging in the Chromatin Fiber

Ch 5: DNA Replication, Repair, and Recombination

5.2: DNA Replication Mechanisms

5.4: DNA Repair

5.5: General Recombination

Ch 6: How Cells Read the Genome: From DNA to Protein

6.1: DNA to RNA

6.2: RNA to Protein

Ch 7: Control of Gene Expression

7.1: An Overview of Gene Control

7.2: DNA-Binding Motifs in Gene Regulatory Proteins

7.3: How Genetic Switches Work

7.5: Posttranscriptional Controls

7.6: How Genomes Evolve

Ch 14: Energy Conversion: Mitochondria and Chloroplasts

14.4: The Genetic Systems of Mitochondria and Plastids

Ch 18: The Mechanics of Cell Division

18.1: An Overview of M Phase

18.2: Mitosis

Ch 20: Germ Cells and Fertilization

20.2: Meiosis


Further reading

* * * *


External links


Comparative Toxicogenomics Database

DNA From The Beginning – a primer on genes and DNA

Gene – a searchable database of genes

''Genes''
nbsp;– an Open Access journal
IDconverter – converts gene IDs between public databases

iHOP – Information Hyperlinked over Proteins

TranscriptomeBrowser – Gene expression profile analysis

The Protein Naming Utility, a database to identify and correct deficient gene names

IMPC (International Mouse Phenotyping Consortium)
nbsp;– Encyclopedia of mammalian gene function
Global Genes Project
nbsp;– Leading non-profit organization supporting people living with genetic diseases
Encode threads explorer
''
Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'' *
Characterization of intergenic regions and gene definition
''
Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'' {{Authority control Cloning Molecular biology Wikipedia articles with sections published in WikiJournal of Medicine