Genomic Alteration
   HOME

TheInfoList



OR:

Genomics is an interdisciplinary field of
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
focusing on the structure, function, evolution, mapping, and editing of
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s. A genome is an organism's complete set of
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
, which refers to the study of ''individual'' genes and their roles in inheritance, genomics aims at the collective characterization and quantification of ''all'' of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, re ...
with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and
systems biology Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological system ...
to facilitate understanding of even the most complex biological systems such as the brain. The field also includes studies of intragenomic (within the genome) phenomena such as
epistasis Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is depe ...
(effect of one gene on another),
pleiotropy Pleiotropy () is a condition in which a single gene or genetic variant influences multiple phenotypic traits. A gene that has such multiple effects is referred to as a ''pleiotropic gene''. Mutations in pleiotropic genes can impact several trait ...
(one gene affecting more than one trait),
heterosis Heterosis, hybrid vigor, or outbreeding enhancement is the improved or increased function of any biological quality in a hybrid offspring. An offspring is heterotic if its traits are enhanced as a result of mixing the genetic contributions o ...
(hybrid vigour), and other interactions between loci and
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s within the genome.


History


Etymology

From the Greek ΓΕΝ ''gen'', "gene" (gamma, epsilon, nu, epsilon) meaning "become, create, creation, birth", and subsequent variants: genealogy, genesis, genetics, genic, genomere, genotype, genus etc. While the word ''genome'' (from the
German German(s) may refer to: * Germany, the country of the Germans and German things **Germania (Roman era) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizenship in Germany, see also Ge ...
''Genom'', attributed to Hans Winkler) was in use in English as early as 1926, the term ''genomics'' was coined by Tom Roderick, a
geneticist A geneticist is a biologist or physician who studies genetics, the science of genes, heredity, and variation of organisms. A geneticist can be employed as a scientist or a lecturer. Geneticists may perform general research on genetic process ...
at the
Jackson Laboratory The Jackson Laboratory (often abbreviated as JAX) is an independent, non-profit biomedical research institution which was founded by Clarence Cook Little in 1929. It employs over 3,000 employees in Bar Harbor, Maine; Sacramento, California; F ...
(
Bar Harbor, Maine Bar Harbor () is a resort town on Mount Desert Island in Hancock County, Maine, United States. As of the 2020 census, its population is 5,089. The town is home to the College of the Atlantic, Jackson Laboratory, and MDI Biological Laborat ...
), over beers with James E. Womack, Tom Shows and Stephen O’Brien at a meeting held in
Maryland Maryland ( ) is a U.S. state, state in the Mid-Atlantic (United States), Mid-Atlantic region of the United States. It borders the states of Virginia to its south, West Virginia to its west, Pennsylvania to its north, and Delaware to its east ...
on the mapping of the human genome in 1986. First as the name for a new journal and then as a whole new science discipline.


Early sequencing efforts

Following
Rosalind Franklin Rosalind Elsie Franklin (25 July 192016 April 1958) was a British chemist and X-ray crystallographer. Her work was central to the understanding of the molecular structures of DNA (deoxyribonucleic acid), RNA (ribonucleic acid), viruses, coal ...
's confirmation of the helical structure of DNA,
James D. Watson James Dewey Watson (born April 6, 1928) is an American molecular biologist, geneticist, and zoologist. In 1953, he co-authored with Francis Crick the academic paper in ''Nature'' proposing the double helix structure of the DNA molecule. Wats ...
and
Francis Crick Francis Harry Compton Crick (8 June 1916 – 28 July 2004) was an English molecular biologist, biophysicist, and neuroscientist. He, James Watson, Rosalind Franklin, and Maurice Wilkins played crucial roles in deciphering the Nucleic acid doub ...
's publication of the structure of DNA in 1953 and
Fred Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was a British biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other prot ...
's publication of the
Amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
sequence of insulin in 1955, nucleic acid sequencing became a major target of early
molecular biologists Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
. In 1964, Robert W. Holley and colleagues published the first nucleic acid sequence ever determined, the
ribonucleotide In biochemistry, a ribonucleotide is a nucleotide containing ribose as its pentose component. It is considered a molecular precursor of nucleic acids. Nucleotides are the basic building blocks of DNA and RNA. Ribonucleotides themselves are basic mo ...
sequence of
alanine Alanine (symbol Ala or A), or α-alanine, is an α-amino acid that is used in the biosynthesis of proteins. It contains an amine group and a carboxylic acid group, both attached to the central carbon atom which also carries a methyl group sid ...
transfer RNA Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
. Extending this work,
Marshall Nirenberg Marshall Warren Nirenberg (April 10, 1927 – January 15, 2010) was an American biochemist and geneticist. He shared a Nobel Prize in Physiology or Medicine in 1968 with Har Gobind Khorana and Robert W. Holley for "breaking the genetic code" a ...
and
Philip Leder Philip Leder (November 19, 1934 – February 2, 2020) was an American geneticist. Early life and education Leder was born in Washington, D.C., and studied at Harvard University, graduating in 1956. In 1960, he graduated from Harvard Medical Sch ...
revealed the triplet nature of the
genetic code Genetic code is a set of rules used by living cell (biology), cells to Translation (biology), translate information encoded within genetic material (DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished ...
and were able to determine the sequences of 54 out of 64
codons Genetic code is a set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets or codons) into proteins. Translation is accomplished by the ribosome, which links pro ...
in their experiments. In 1972,
Walter Fiers Walter Fiers (31 January 1931 in Ypres, West Flanders – 28 July 2019 in Destelbergen) was a Belgian molecular biologist. He obtained a degree of Engineer for Chemistry and Agricultural Industries at the University of Ghent in 1954, and starte ...
and his team at the Laboratory of Molecular Biology of the
University of Ghent Ghent University (, abbreviated as UGent) is a Public university, public research university located in Ghent, in the East Flanders province of Belgium. Located in Flanders, Ghent University is the second largest Belgian university, consisting o ...
(
Ghent Ghent ( ; ; historically known as ''Gaunt'' in English) is a City status in Belgium, city and a Municipalities of Belgium, municipality in the Flemish Region of Belgium. It is the capital and largest city of the Provinces of Belgium, province ...
,
Belgium Belgium, officially the Kingdom of Belgium, is a country in Northwestern Europe. Situated in a coastal lowland region known as the Low Countries, it is bordered by the Netherlands to the north, Germany to the east, Luxembourg to the southeas ...
) were the first to determine the sequence of a gene: the gene for
Bacteriophage MS2 Bacteriophage MS2 (''Emesvirus zinderi''), commonly called MS2, is an icosahedral, positive-sense single-stranded RNA virus that infects the bacterium ''Escherichia coli'' and other members of the Enterobacteriaceae. MS2 is a member of a family ...
coat protein. Fiers' group expanded on their MS2 coat protein work, determining the complete nucleotide-sequence of bacteriophage MS2-RNA (whose genome encodes just four genes in 3569
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s [bp]) and
Simian virus 40 SV40 is an abbreviation for simian vacuolating virus 40 or simian virus 40, a polyomavirus that is found in both monkeys and humans. Like other polyomaviruses, SV40 is a DNA virus that is found to cause tumors in humans and animals, but most oft ...
in 1976 and 1978, respectively.


DNA-sequencing technology developed

In addition to his seminal work on the amino acid sequence of insulin,
Frederick Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was a British biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other prote ...
and his colleagues played a key role in the development of DNA sequencing techniques that enabled the establishment of comprehensive genome sequencing projects. In 1975, he and Alan Coulson published a sequencing procedure using DNA polymerase with radiolabelled nucleotides that he called the ''Plus and Minus technique''. This involved two closely related methods that generated short oligonucleotides with defined 3' termini. These could be fractionated by
electrophoresis Electrophoresis is the motion of charged dispersed particles or dissolved charged molecules relative to a fluid under the influence of a spatially uniform electric field. As a rule, these are zwitterions with a positive or negative net ch ...
on a
polyacrylamide Polyacrylamide (abbreviated as PAM or pAAM) is a polymer with the formula (-CH2CHCONH2-). It has a linear-chain structure. PAM is highly water-absorbent, forming a soft gel when hydrated. In 2008, an estimated 750,000,000 kg were produced, ...
gel (called polyacrylamide gel electrophoresis) and visualised using autoradiography. The procedure could sequence up to 80 nucleotides in one go and was a big improvement, but was still very laborious. Nevertheless, in 1977 his group was able to sequence most of the 5,386 nucleotides of the single-stranded
bacteriophage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
φX174, completing the first fully sequenced DNA-based genome. The refinement of the ''Plus and Minus'' method resulted in the chain-termination, or Sanger method (see
below Below may refer to: *Earth *Ground (disambiguation) *Soil *Floor * Bottom (disambiguation) *Less than *Temperatures below freezing *Hell or underworld People with the surname * Ernst von Below (1863–1955), German World War I general * Fred Belo ...
), which formed the basis of the techniques of DNA sequencing, genome mapping, data storage, and bioinformatic analysis most widely used in the following quarter-century of research. In the same year
Walter Gilbert Walter Gilbert (born March 21, 1932) is an American biochemist, physicist, molecular biology pioneer, and Nobel laureate. Education and early life Walter Gilbert was born in Boston, Massachusetts, on March 21, 1932, into a Jewish family, the so ...
and Allan Maxam of
Harvard University Harvard University is a Private university, private Ivy League research university in Cambridge, Massachusetts, United States. Founded in 1636 and named for its first benefactor, the History of the Puritans in North America, Puritan clergyma ...
independently developed the Maxam-Gilbert method (also known as the ''chemical method'') of DNA sequencing, involving the preferential cleavage of DNA at known bases, a less efficient method. For their groundbreaking work in the sequencing of nucleic acids, Gilbert and Sanger shared half the 1980
Nobel Prize The Nobel Prizes ( ; ; ) are awards administered by the Nobel Foundation and granted in accordance with the principle of "for the greatest benefit to humankind". The prizes were first awarded in 1901, marking the fifth anniversary of Alfred N ...
in chemistry with
Paul Berg Paul Berg (June 30, 1926 – February 15, 2023) was an American biochemist and professor at Stanford University. He was the recipient of the Nobel Prize in Chemistry in 1980, along with Walter Gilbert and Frederick Sanger. The award recogniz ...
(
recombinant DNA Recombinant DNA (rDNA) molecules are DNA molecules formed by laboratory methods of genetic recombination (such as molecular cloning) that bring together genetic material from multiple sources, creating sequences that would not otherwise be fo ...
).


Complete genomes

The advent of these technologies resulted in a rapid intensification in the scope and speed of completion of genome sequencing projects. The first complete genome sequence of a eukaryotic organelle, the human
mitochondrion A mitochondrion () is an organelle found in the cell (biology), cells of most eukaryotes, such as animals, plants and fungi. Mitochondria have a double lipid bilayer, membrane structure and use aerobic respiration to generate adenosine tri ...
(16,568 bp, about 16.6 kb [kilobase]), was reported in 1981, and the first
chloroplast A chloroplast () is a type of membrane-bound organelle, organelle known as a plastid that conducts photosynthesis mostly in plant cell, plant and algae, algal cells. Chloroplasts have a high concentration of chlorophyll pigments which captur ...
genomes followed in 1986. In 1992, the first eukaryotic
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
, chromosome III of brewer's yeast ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have be ...
'' (315 kb) was sequenced. The first free-living organism to be sequenced was that of ''
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, Motility, non-motile, Coccobacillus, coccobacillary, facultative anaerobic organism, facultatively anaerobic, Capnophile, capnophili ...
'' (1.8 Mb [megabase]) in 1995. The following year a consortium of researchers from laboratories across
North America North America is a continent in the Northern Hemisphere, Northern and Western Hemisphere, Western hemispheres. North America is bordered to the north by the Arctic Ocean, to the east by the Atlantic Ocean, to the southeast by South Ameri ...
,
Europe Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. It is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east ...
, and
Japan Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
announced the completion of the first complete genome sequence of a eukaryote, '' S. cerevisiae'' (12.1 Mb), and since then genomes have continued being sequenced at an exponentially growing pace. , the complete sequences are available for: 2,719
virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es, 1,115
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
and
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
, and 36
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, of which about half are
fungi A fungus (: fungi , , , or ; or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified as one ...
. Most of the microorganisms whose genomes have been completely sequenced are problematic
pathogen In biology, a pathogen (, "suffering", "passion" and , "producer of"), in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a Germ theory of d ...
s, such as ''
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, Motility, non-motile, Coccobacillus, coccobacillary, facultative anaerobic organism, facultatively anaerobic, Capnophile, capnophili ...
'', which has resulted in a pronounced bias in their phylogenetic distribution compared to the breadth of microbial diversity. Of the other sequenced species, most were chosen because they were well-studied model organisms or promised to become good models. Yeast (''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have be ...
'') has long been an important
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
for the
eukaryotic cell The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of Out ...
, while the fruit fly ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
'' has been a very important tool (notably in early pre-molecular
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
). The worm ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
'' is an often used simple model for
multicellular organism A multicellular organism is an organism that consists of more than one cell (biology), cell, unlike unicellular organisms. All species of animals, Embryophyte, land plants and most fungi are multicellular, as are many algae, whereas a few organism ...
s. The zebrafish '' Brachydanio rerio'' is used for many developmental studies on the molecular level, and the plant ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
'' is a model organism for flowering plants. The Japanese pufferfish (''
Takifugu rubripes ''Takifugu rubripes'', commonly known as the Japanese puffer, Japanese pufferfish, Tiger puffer, or torafugu (), is a pufferfish in the genus '' Takifugu''. It is distinguished by a very small genome that has been fully sequenced because of its ...
'') and the spotted green pufferfish ('' Tetraodon nigroviridis'') are interesting because of their small and compact genomes, which contain very little
noncoding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regu ...
compared to most species. The mammals dog (''
Canis familiaris The dog (''Canis familiaris'' or ''Canis lupus familiaris'') is a Domestication of vertebrates, domesticated descendant of the gray wolf. Also called the domestic dog, it was Selective breeding, selectively bred from a population of wolves ...
''), brown rat (''
Rattus norvegicus ''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus. Species and description The best-known ''Rattus'' species are the black rat (''R. rattus' ...
''), mouse (''
Mus musculus The house mouse (''Mus musculus'') is a small mammal of the rodent family Muridae, characteristically having a pointed snout, large rounded ears, and a long and almost hairless tail. It is one of the most abundant species of the genus ''Mus (genu ...
''), and chimpanzee (''
Pan troglodytes The chimpanzee (; ''Pan troglodytes''), also simply known as the chimp, is a species of great ape native to the forests and savannahs of tropical Africa. It has four confirmed subspecies and a fifth proposed one. When its close relative the b ...
'') are all important model animals in medical research. A rough draft of the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
was completed by the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
in early 2001, creating much fanfare. This project, completed in 2003, sequenced the entire genome for one specific person, and by 2007 this sequence was declared "finished" (less than one error in 20,000 bases and all chromosomes assembled). In the years since then, the genomes of many other individuals have been sequenced, partly under the auspices of the
1000 Genomes Project The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least o ...
, which announced the sequencing of 1,092 genomes in October 2012. Completion of this project was made possible by the development of dramatically more efficient sequencing technologies and required the commitment of significant
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
resources from a large international collaboration. The continued analysis of human genomic data has profound political and social repercussions for human societies.


The "omics" revolution

The English-language
neologism In linguistics, a neologism (; also known as a coinage) is any newly formed word, term, or phrase that has achieved popular or institutional recognition and is becoming accepted into mainstream language. Most definitively, a word can be considered ...
omics informally refers to a field of study in biology ending in ''-omics'', such as genomics,
proteomics Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
or
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
. The related suffix -ome is used to address the objects of study of such fields, such as the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
,
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
, or
metabolome The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The ...
(
lipidome The lipidome refers to the totality of lipids in Cell (biology), cells. Lipids are one of the four major molecular components of biological organisms, along with proteins, sugars and nucleic acids. Lipidome is a term coined in the context of om ...
) respectively. The suffix ''-ome'' as used in molecular biology refers to a ''totality'' of some sort; similarly omics has come to refer generally to the study of large, comprehensive biological data sets. While the growth in the use of the term has led some scientists (
Jonathan Eisen Jonathan Andrew Eisen (born August 31, 1968) is an American evolutionary biologist, currently working at University of California, Davis. His academic research is in the fields of evolutionary biology, genomics and microbiology and he is the a ...
, among others) to claim that it has been oversold, it reflects the change in orientation towards the quantitative analysis of complete or near-complete assortment of all the constituents of a system. In the study of
symbioses Symbiosis (Ancient Greek : living with, companionship < : together; and ''bíōsis'': living) is any type of a close and long-term


Genome analysis

After an organism has been selected, genome projects involve three components: the sequencing of DNA, the assembly of that sequence to create a representation of the original chromosome, and the annotation and analysis of that representation.


Sequencing

Historically, sequencing was done in ''sequencing centers'', centralized facilities (ranging from large independent institutions such as
Joint Genome Institute The Joint Genome Institute (JGI) is a scientific user facility for integrative genomic science at Lawrence Berkeley National Laboratory. The mission of the JGI is to advance genomics research in support of the United States Department of Energy ...
which sequence dozens of terabases a year, to local molecular biology core facilities) which contain research laboratories with the costly instrumentation and technical support necessary. As sequencing technology continues to improve, however, a new generation of effective fast turnaround benchtop sequencers has come within reach of the average academic laboratory. On the whole, genome sequencing approaches fall into two broad categories, ''shotgun'' and ''high-throughput'' (or ''next-generation'') sequencing.


Shotgun sequencing

Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. It is named by analogy with the rapidly expanding, quasi-random firing pattern of a
shotgun A shotgun (also known as a scattergun, peppergun, or historically as a fowling piece) is a long gun, long-barreled firearm designed to shoot a straight-walled cartridge (firearms), cartridge known as a shotshell, which discharges numerous small ...
. Since gel electrophoresis sequencing can only be used for fairly short sequences (100 to 1000 base pairs), longer DNA sequences must be broken into random small segments which are then sequenced to obtain ''reads''. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence. Shotgun sequencing is a random sampling process, requiring over-sampling to ensure a given
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
is represented in the reconstructed sequence; the average number of reads by which a genome is over-sampled is referred to as coverage. For much of its history, the technology underlying shotgun sequencing was the classical chain-termination method or ' Sanger method', which is based on the selective incorporation of chain-terminating dideoxynucleotides by
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
during
in vitro ''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
. Recently, shotgun sequencing has been supplanted by
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
methods, especially for large-scale, automated
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
analyses. However, the Sanger method remains in wide use, primarily for smaller-scale projects and for obtaining especially long contiguous DNA sequence reads (>500 nucleotides). Chain-termination methods require a single-stranded DNA template, a DNA
primer Primer may refer to: Arts, entertainment, and media Films * ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth * ''Primer'' (video), a documentary about the funk band Living Colour Literature * Primer (textbook), a te ...
, a
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
, normal deoxynucleosidetriphosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'- OH group required for the formation of a
phosphodiester bond In chemistry, a phosphodiester bond occurs when exactly two of the hydroxyl groups () in phosphoric acid react with hydroxyl groups on other molecules to form two ester bonds. The "bond" involves this linkage . Discussion of phosphodiesters is d ...
between two nucleotides, causing DNA polymerase to cease extension of DNA when a ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labelled for detection in
DNA sequencer A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then ...
s. Typically, these machines can sequence up to 96 DNA samples in a single batch (run) in up to 48 runs a day.


High-throughput sequencing

The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. High-throughput sequencing is intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. In ultra-high-throughput sequencing, as many as 500,000 sequencing-by-synthesis operations may be run in parallel. The Illumina dye sequencing method is based on reversible dye-terminators and was developed in 1996 at the Geneva Biomedical Research Institute, by Pascal Mayer and Laurent Farinelli. In this method, DNA molecules and primers are first attached on a slide and amplified with
polymerase In biochemistry, a polymerase is an enzyme (Enzyme Commission number, EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by ...
so that local clonal colonies, initially coined "DNA colonies", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera. Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity; with an optimal configuration, the ultimate throughput of the instrument depends only on the
A/D conversion In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a Digital signal (signal processing), digi ...
rate of the camera. The camera takes images of the fluorescently labeled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle. An alternative approach, ion semiconductor sequencing, is based on standard DNA replication chemistry. This technology measures the release of a hydrogen ion each time a base is incorporated. A microwell containing template DNA is flooded with a single
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
, if the nucleotide is complementary to the template strand it will be incorporated and a hydrogen ion will be released. This release triggers an ISFET ion sensor. If a
homopolymer A polymer () is a substance or material that consists of very large molecules, or macromolecules, that are constituted by many repeating subunits derived from one or more species of monomers. Due to their broad spectrum of properties, both ...
is present in the template sequence multiple nucleotides will be incorporated in a single flood cycle, and the detected electrical signal will be proportionally higher.


Assembly

Sequence assembly refers to aligning and merging fragments of a much longer
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequence in order to reconstruct the original sequence. This is needed as current
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
technology cannot read whole genomes as a continuous sequence, but rather reads small pieces of between 20 and 1000 bases, depending on the technology used. Third generation sequencing technologies such as PacBio or Oxford Nanopore routinely generate sequencing reads 10-100 kb in length; however, they have a high error rate at approximately 1 percent. Typically the short fragments, called reads, result from
shotgun sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The Sanger sequencing#Method, chain-termination method of DNA sequencin ...
genomic Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
DNA, or gene transcripts ( ESTs).


Assembly approaches

Assembly can be broadly categorized into two approaches: ''de novo'' assembly, for genomes which are not similar to any sequenced in the past, and comparative assembly, which uses the existing sequence of a closely related organism as a reference during assembly. Relative to comparative assembly, ''de novo'' assembly is computationally difficult (
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
), making it less favourable for short-read NGS technologies. Within the ''de novo'' assembly paradigm there are two primary strategies for assembly, Eulerian path strategies, and overlap-layout-consensus (OLC) strategies. OLC strategies ultimately try to create a Hamiltonian path through an overlap graph which is an NP-hard problem. Eulerian path strategies are computationally more tractable because they try to find a Eulerian path through a deBruijn graph.


Finishing

Finished genomes are defined as having a single contiguous sequence with no ambiguities representing each replicon.


Annotation

The DNA sequence assembly alone is of little value without additional analysis. Genome annotation is the process of attaching biological information to
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
, and consists of three main steps: # identifying portions of the genome that do not code for proteins # identifying elements on the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, a process called
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
, and # attaching biological information to these elements. Automatic annotation tools try to perform these steps ''
in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon' (correct ), referring to silicon in computer chips. It was c ...
'', as opposed to manual annotation (a.k.a. curation) which involves human expertise and potential experimental verification. Ideally, these approaches co-exist and complement each other in the same annotation
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
(also see
below Below may refer to: *Earth *Ground (disambiguation) *Soil *Floor * Bottom (disambiguation) *Less than *Temperatures below freezing *Hell or underworld People with the surname * Ernst von Below (1863–1955), German World War I general * Fred Belo ...
). Traditionally, the basic level of annotation is using
BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front *A planned explosion in a mine, ...
for finding similarities, and then annotating genomes based on homologues. More recently, additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g.
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
) rely on both curated data sources as well as a range of software tools in their automated genome annotation pipeline. ''Structural annotation'' consists of the identification of genomic elements, primarily
ORFs ORFS stands for ''Output RF Spectrum'', where 'RF' stands for Radio Frequency. The acronym ORFS is used in the context of mobile communication systems, e.g., GSM. It stands for the relationship between (a) the frequency offset from the carrier and ...
and their localisation, or gene structure. ''Functional annotation'' consists of attaching biological information to genomic elements.


Sequencing pipelines and databases

The need for reproducibility and efficient management of the large amount of data associated with genome projects mean that computational pipelines have important applications in genomics.


Research areas


Functional genomics

Functional genomics is a field of
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
(and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
) functions and interactions. Functional genomics focuses on the dynamic aspects such as gene transcription,
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
, and
protein–protein interaction Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and t ...
s, as opposed to the static aspects of the genomic information such as
DNA sequence A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "gene-by-gene" approach. A major branch of genomics is still concerned with
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of
functional genomics Functional genomics is a field of molecular biology that attempts to describe gene (and protein) functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects (such as genome sequen ...
, mainly concerned with patterns of
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
during various conditions. The most important tools here are
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
s and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
.


Structural genomics

Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches. The principal difference between structural genomics and traditional structural prediction is that structural genomics attempts to determine the structure of every protein encoded by the genome, rather than focusing on one particular protein. With full-genome sequences available, structure prediction can be done more quickly through a combination of experimental and modeling approaches, especially because the availability of large numbers of sequenced genomes and previously solved protein structures allow scientists to model protein structure on the structures of previously solved homologs. Structural genomics involves taking a large number of approaches to structure determination, including experimental methods using genomic sequences or modeling-based approaches based on sequence or
structural homology A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence simila ...
to a protein of known structure or based on chemical and physical principles for a protein with no homology to any known structure. As opposed to traditional
structural biology Structural biology deals with structural analysis of living material (formed, composed of, and/or maintained and refined by living cells) at every level of organization. Early structural biologists throughout the 19th and early 20th centuries we ...
, the determination of a
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
through a structural genomics effort often (but not always) comes before anything is known regarding the protein function. This raises new challenges in structural bioinformatics, i.e. determining protein function from its 3D structure.


Epigenomics

Epigenomics is the study of the complete set of
epigenetic In biology, epigenetics is the study of changes in gene expression that happen without changes to the DNA sequence. The Greek prefix ''epi-'' (ἐπι- "over, outside of, around") in ''epigenetics'' implies features that are "on top of" or "in ...
modifications on the genetic material of a cell, known as the
epigenome In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerat ...
. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence (Russell 2010 p. 475). Two of the most characterized epigenetic modifications are
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
and
histone modification In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaeal phyla. They act as spools around which DNA winds to create structural units called nucleosomes. ...
. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and
tumorigenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abn ...
. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.


Metagenomics

Metagenomics is the study of ''metagenomes'', genetic material recovered directly from
environmental Environment most often refers to: __NOTOC__ * Natural environment, referring respectively to all living and non-living things occurring naturally and the physical and biological factors along with their chemical interactions that affect an organism ...
samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics. While traditional
microbiology Microbiology () is the branches of science, scientific study of microorganisms, those being of unicellular organism, unicellular (single-celled), multicellular organism, multicellular (consisting of complex cells), or non-cellular life, acellula ...
and microbial
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
rely upon cultivated clonal
cultures Culture ( ) is a concept that encompasses the social behavior, institutions, and Social norm, norms found in human societies, as well as the knowledge, beliefs, arts, laws, Social norm, customs, capabilities, Attitude (psychology), attitudes ...
, early environmental gene sequencing cloned specific genes (often the
16S rRNA 16S ribosomal RNA (or 16Svedberg, S rRNA) is the RNA component of the 30S subunit of a prokaryotic ribosome (SSU rRNA). It binds to the Shine-Dalgarno sequence and provides most of the SSU structure. The genes coding for it are referred to as ...
gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods. Recent studies use "shotgun"
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
or massively parallel
pyrosequencing Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequ ...
to get largely unbiased samples of all genes from all the members of the sampled communities. Because of its power to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world.


Model systems


Viruses and bacteriophages

Bacteriophage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
s have played and continue to play a key role in bacterial
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
and
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
. Historically, they were used to define
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
structure and gene regulation. Also the first
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
to be sequenced was a
bacteriophage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
. However, bacteriophage research did not lead the genomics revolution, which is clearly dominated by bacterial genomics. Only very recently has the study of bacteriophage genomes become prominent, thereby enabling researchers to understand the mechanisms underlying
phage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that encapsulate a DNA or RNA genome, and may have structures tha ...
evolution. Bacteriophage genome sequences can be obtained through direct sequencing of isolated bacteriophages, but can also be derived as part of microbial genomes. Analysis of bacterial genomes has shown that a substantial amount of microbial DNA consists of
prophage A prophage is a bacteriophage (often shortened to "phage") genome that is integrated into the circular bacterial chromosome or exists as an extrachromosomal plasmid within the bacterial cell (biology), cell. Integration of prophages into the bacte ...
sequences and prophage-like elements. A detailed database mining of these sequences offers insights into the role of prophages in shaping the bacterial genome: Overall, this method verified many known bacteriophage groups, making this a useful tool for predicting the relationships of prophages from bacterial genomes.


Cyanobacteria

At present there are 24
cyanobacteria Cyanobacteria ( ) are a group of autotrophic gram-negative bacteria that can obtain biological energy via oxygenic photosynthesis. The name "cyanobacteria" () refers to their bluish green (cyan) color, which forms the basis of cyanobacteri ...
for which a total genome sequence is available. 15 of these cyanobacteria come from the marine environment. These are six ''
Prochlorococcus ''Prochlorococcus'' is a genus of very small (0.6  μm) marine cyanobacteria with an unusual pigmentation ( chlorophyll ''a2'' and ''b2''). These bacteria belong to the photosynthetic picoplankton and are probably the most abundant photosyn ...
'' strains, seven marine ''
Synechococcus ''Synechococcus'' (from the Greek ''synechos'', in succession, and the Greek ''kokkos'', granule) is a unicellular cyanobacterium that is very widespread in the marine environment. Its size varies from 0.8 to 1.5  μm. The photosynthetic ...
'' strains, '' Trichodesmium erythraeum'' IMS101 and '' Crocosphaera watsonii'' WH8501. Several studies have demonstrated how these sequences could be used very successfully to infer important ecological and physiological characteristics of marine cyanobacteria. However, there are many more genome projects currently in progress, amongst those there are further ''
Prochlorococcus ''Prochlorococcus'' is a genus of very small (0.6  μm) marine cyanobacteria with an unusual pigmentation ( chlorophyll ''a2'' and ''b2''). These bacteria belong to the photosynthetic picoplankton and are probably the most abundant photosyn ...
'' and marine ''
Synechococcus ''Synechococcus'' (from the Greek ''synechos'', in succession, and the Greek ''kokkos'', granule) is a unicellular cyanobacterium that is very widespread in the marine environment. Its size varies from 0.8 to 1.5  μm. The photosynthetic ...
'' isolates, ''
Acaryochloris ''Acaryochloris marina'' is a species of unicellular Cyanobacteria that produces chlorophyll d as its primary pigment (instead of the typically used chlorophyll a), allowing it to photosynthesize using far-red light, at 700-750 nm wavelen ...
'' and ''
Prochloron ''Prochloron'' (from the Greek ''pro'' (before) and the Greek ''chloros'' (green) ) is a genus of unicellular oxygenic photosynthetic prokaryotes commonly found as an extracellular symbiont on coral reefs, particularly in didemnid ascidians (sea ...
'', the N2-fixing filamentous cyanobacteria ''
Nodularia ''Nodularia'' is a genus of filamentous nitrogen-fixing cyanobacteria, or blue-green algae. They occur mainly in brackish or salinic waters, such as the hypersaline Makgadikgadi Pans, the Peel-Harvey Estuary in Western Australia or the Baltic Sea ...
spumigena'', '' Lyngbya aestuarii'' and ''
Lyngbya majuscula ''Lyngbya majuscula'' is a species of filamentous cyanobacteria in the genus '' Lyngbya''. It is named after the Dane Hans Christian Lyngbye. As a result of recent genetic analyses, several new genera were erected from the genus ''Lyngbya'': ...
'', as well as
bacteriophage A bacteriophage (), also known informally as a phage (), is a virus that infects and replicates within bacteria. The term is derived . Bacteriophages are composed of proteins that Capsid, encapsulate a DNA or RNA genome, and may have structu ...
s infecting marine cyanobaceria. Thus, the growing body of genome information can also be tapped in a more general way to address global problems by applying a comparative approach. Some new and exciting examples of progress in this field are the identification of genes for regulatory RNAs, insights into the evolutionary origin of
photosynthesis Photosynthesis ( ) is a system of biological processes by which photosynthetic organisms, such as most plants, algae, and cyanobacteria, convert light energy, typically from sunlight, into the chemical energy necessary to fuel their metabo ...
, or estimation of the contribution of
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). HGT is an important factor in the e ...
to the genomes that have been analyzed.


Applications

Genomics has provided applications in many fields, including
medicine Medicine is the science and Praxis (process), practice of caring for patients, managing the Medical diagnosis, diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, ...
,
biotechnology Biotechnology is a multidisciplinary field that involves the integration of natural sciences and Engineering Science, engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists ...
,
anthropology Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, society, societies, and linguistics, in both the present and past, including archaic humans. Social anthropology studies patterns of behav ...
and other
social sciences Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of society, societies and the Social relation, relationships among members within those societies. The term was former ...
.


Genomic medicine

Next-generation genomic technologies allow clinicians and biomedical researchers to drastically increase the amount of genomic data collected on large study populations. When combined with new informatics approaches that integrate many kinds of data with genomic data in disease research, this allows researchers to better understand the genetic bases of drug response and disease. Early efforts to apply the genome to medicine included those by a Stanford team led by Euan Ashley who developed the first tools for the medical interpretation of a human genome. The Genomes2People research program at
Brigham and Women’s Hospital Brigham and Women's Hospital (BWH or The Brigham) is a teaching hospital of Harvard Medical School and the largest hospital in the Longwood Medical Area in Boston, Massachusetts. Along with Massachusetts General Hospital, it is one of the two ...
,
Broad Institute The Eli and Edythe L. Broad Institute of MIT and Harvard (IPA: , pronunciation respelling: ), often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, United States. The institu ...
and Harvard Medical School was established in 2012 to conduct empirical research in translating genomics into health.
Brigham and Women's Hospital Brigham and Women's Hospital (BWH or The Brigham) is a teaching hospital of Harvard Medical School and the largest hospital in the Longwood Medical Area in Boston, Massachusetts. Along with Massachusetts General Hospital, it is one of the two ...
opened a Preventive Genomics Clinic in August 2019, with
Massachusetts General Hospital Massachusetts General Hospital (Mass General or MGH) is a teaching hospital located in the West End neighborhood of Boston, Massachusetts. It is the original and largest clinical education and research facility of Harvard Medical School/Harvar ...
following a month later. The ''All of Us'' research program aims to collect genome sequence data from 1 million participants to become a critical component of the precision medicine research platform and the ''UK Biobank'' initiative has studied more than 500.000 individuals with deep genomic and phenotypic data.


Synthetic biology and bioengineering

The growth of genomic knowledge has enabled increasingly sophisticated applications of
synthetic biology Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms. It applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nat ...
. In 2010 researchers at the
J. Craig Venter Institute The J. Craig Venter Institute (JCVI) is a non-profit genomics research institute founded by J. Craig Venter, Ph.D. in October 2006. The institute was the result of consolidating four organizations: the Center for the Advancement of Ge ...
announced the creation of a partially synthetic species of
bacterium Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among the ...
, ''
Mycoplasma laboratorium ''Mycoplasma laboratorium'' or Synthia refers to a plan to produce a synthetic biology, synthetic strain of bacterium. The project to build the new bacterium has evolved since its inception. Initially the goal was to identify a minimal set of ge ...
'', derived from the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
of ''
Mycoplasma genitalium ''Mycoplasma genitalium'' (also known as ''MG','' Mgen, or since 2018, ''Mycoplasmoides genitalium'') is a sexually transmitted, small and pathogenic bacterium that lives on the mucous epithelial cells of the urinary and genital tracts in ...
''.


Population and conservation genomics

Population genomics Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of micro ...
has developed as a popular field of research, where genomic sequencing methods are used to conduct large-scale comparisons of DNA sequences among populations - beyond the limits of genetic markers such as short-range PCR products or
microsatellites A microsatellite is a tract of repetitive DNA in which certain DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. T ...
traditionally used in
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
. Population genomics studies
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
-wide effects to improve our understanding of
microevolution Microevolution is the change in allele frequencies that occurs over time within a population. This change is due to four different processes: mutation, selection ( natural and artificial), gene flow and genetic drift. This change happens over ...
so that we may learn the
phylogenetic In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
history and
demography Demography () is the statistical study of human populations: their size, composition (e.g., ethnic group, age), and how they change through the interplay of fertility (births), mortality (deaths), and migration. Demographic analysis examine ...
of a population. Population genomic methods are used for many different fields including
evolutionary biology Evolutionary biology is the subfield of biology that studies the evolutionary processes such as natural selection, common descent, and speciation that produced the diversity of life on Earth. In the 1930s, the discipline of evolutionary biolo ...
,
ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...
,
biogeography Biogeography is the study of the species distribution, distribution of species and ecosystems in geography, geographic space and through evolutionary history of life, geological time. Organisms and biological community (ecology), communities o ...
,
conservation biology Conservation biology is the study of the conservation of nature and of Earth's biodiversity with the aim of protecting species, their habitats, and ecosystems from excessive rates of extinction and the erosion of biotic interactions. It is an i ...
and
fisheries management The management of fisheries is broadly defined as the set of tasks which guide vested parties and managers in the optimal use of aquatic renewable resources, primarily fish. According to the Food and Agriculture Organization of the United Nation ...
. Similarly, landscape genomics has developed from landscape genetics to use genomic methods to identify relationships between patterns of environmental and genetic variation. Conservationists can use the information gathered by genomic sequencing in order to better evaluate genetic factors key to species conservation, such as the
genetic diversity Genetic diversity is the total number of genetic characteristics in the genetic makeup of a species. It ranges widely, from the number of species to differences within species, and can be correlated to the span of survival for a species. It is d ...
of a population or whether an individual is heterozygous for a recessive inherited genetic disorder. By using genomic data to evaluate the effects of evolutionary processes and to detect patterns in variation throughout a given population, conservationists can formulate plans to aid a given species without as many variables left unknown as those unaddressed by standard Conservation genetics, genetic approaches.


See also

* Hi-C (genomic analysis technique) * Cognitive genomics * Computational genomics * Epigenomics * Functional genomics * GeneCalling, an mRNA profiling technology * Genomics of domestication * Genetics in fiction * Glycomics * Immunomics * Metagenomics * Pathogenomics * Personal genomics * Proteomics * Transcriptomics * Venomics * Psychogenomics * Whole genome sequencing *Thomas Roderick


References


Further reading

* * * * * electronic-book electronic-


External links


Annual Review of Genomics and Human Genetics

BMC Genomics
A BMC journal on Genomics
Genomics journal

Genomics.org
An openfree genomics portal.
NHGRI
US government's genome institute
JCVI Comprehensive Microbial Resource

KoreaGenome.org
The first Korean Genome published and the sequence is available freely.
GenomicsNetwork
Looks at the development and use of the science and technologies of genomics.
Institute for Genome Sciences
Genomics research.
MIT OpenCourseWare HST.512 Genomic Medicine
A free, self-study course in genomic medicine. Resources include audio lectures and selected lecture notes.
ENCODE threads explorer
Machine learning approaches to genomics. Nature (journal)
Global map of genomics laboratories

Genomics: Scitable by nature education

Learn All About Genetics Online
{{Authority control Genomics,