HOME

TheInfoList



OR:

Genome projects are
scientific Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence for ...
endeavours that ultimately aim to determine the complete
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
sequence of an
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells ( cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and fu ...
(be it an
animal Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia. With few exceptions, animals consume organic material, breathe oxygen, are able to move, can reproduce sexually, and go through an ontogenetic stage ...
, a
plant Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae excl ...
, a
fungus A fungus ( : fungi or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and molds, as well as the more familiar mushrooms. These organisms are classified as a kingdom, separately fr ...
, a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
, an
archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaeba ...
n, a
protist A protist () is any eukaryotic organism (that is, an organism whose cells contain a cell nucleus) that is not an animal, plant, or fungus. While it is likely that protists share a common ancestor (the last eukaryotic common ancestor), the e ...
or a
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsk ...
) and to annotate protein-coding
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
in the organism. For a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of
autosome An autosome is any chromosome that is not a sex chromosome. The members of an autosome pair in a diploid cell have the same morphology, unlike those in allosomal (sex chromosome) pairs, which may have different structures. The DNA in autosomes ...
s and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences. The
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both ...
is a well known example of a genome project.


Genome assembly

Genome assembly refers to the process of taking a large number of short
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...
s and reassembling them to create a representation of the original
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
s from which the DNA originated. In a
shotgun sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing ...
project, all the DNA from a source (usually a single
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells ( cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and fu ...
, anything from a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
to a
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur ...
) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult
computational Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An espe ...
problem, made more difficult because many genomes contain large numbers of identical sequences, known as
repeats A rerun or repeat is a rebroadcast of an episode of a radio or television program. There are two types of reruns – those that occur during a hiatus, and those that occur when a program is syndicated. Variations In the United Kingdom, the wor ...
. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of
plant Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae excl ...
s and
animal Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia. With few exceptions, animals consume organic material, breathe oxygen, are able to move, can reproduce sexually, and go through an ontogenetic stage ...
s. The resulting (draft) genome sequence is produced by combining the information sequenced contigs and then employing linking information to create scaffolds. Scaffolds are positioned along the
physical map A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does ...
of the chromosomes creating a "golden path".


Assembly software

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler ''Short Oligonucleotide Analysis Package'' developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.


Genome annotation

Since the 1980s,
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and phys ...
and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
have created the need for DNA annotation. DNA annotation or genome annotation is the process of identifying attaching biological information to sequences , and particularly in identifying the locations of genes and determining what those genes do.


Time of completion

When
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which suc ...
a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of
mitochondria A mitochondrion (; ) is an organelle found in the cells of most Eukaryotes, such as animals, plants and fungi. Mitochondria have a double membrane structure and use aerobic respiration to generate adenosine triphosphate (ATP), which is used ...
and (for plants)
chloroplasts A chloroplast () is a type of membrane-bound organelle known as a plastid that conducts photosynthesis mostly in plant and algal cells. The photosynthetic pigment chlorophyll captures the energy from sunlight, converts it, and stores it in ...
as these
organelles In cell biology, an organelle is a specialized subunit, usually within a cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as organs are to the body, hence ''organelle,'' the ...
have their own genomes. It is often reported that the goal of sequencing a genome is to obtain information about the complete set of
genes In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacter ...
such as humans, where
coding DNA The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to n ...
may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the
coding region The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to n ...
s separately. Also, as scientists understand more about the role of this noncoding DNA (often referred to as
junk DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
s to help find out where the genes actually are.


Historical and technological perspectives

Historically, when sequencing eukaryotic genomes (such as the worm ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (r ...
'') it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be ' shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach). Improvements in
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...
technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
) and newer technology has also meant that genomes can be sequenced far more quickly. When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as
model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
or have a relevance to human health (e.g. pathogenic
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
or vectors of disease such as
mosquito Mosquitoes (or mosquitos) are members of a group of almost 3,600 species of small flies within the family Culicidae (from the Latin ''culex'' meaning " gnat"). The word "mosquito" (formed by ''mosca'' and diminutive ''-ito'') is Spanish for "li ...
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in
molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genet ...
(e.g. the
common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative t ...
). In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.


Examples

Many organisms have genome projects that have either been completed or will be completed shortly, including: *
Human Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, cultu ...
s, ''Homo sapiens''; see
Human genome project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both ...
* Humans, ''Homo sapiens''; see The Human Genome Project–Write * Palaeo-Eskimo, an ancient-human *
Neanderthal Neanderthals (, also ''Homo neanderthalensis'' and erroneously ''Homo sapiens neanderthalensis''), also written as Neandertals, are an Extinction, extinct species or subspecies of archaic humans who lived in Eurasia until about 40,000 years ag ...
, ''Homo sapiens neanderthalensis'' (partial); see
Neanderthal Genome Project The Neanderthal genome project is an effort of a group of scientists to sequence the Neanderthal genome, founded in July 2006. It was initiated by 454 Life Sciences, a biotechnology company based in Branford, Connecticut in the United States and ...
*
Common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative t ...
''Pan troglodytes''; see
Chimpanzee Genome Project The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Pro ...
*
Wooly mammoth Wool is the textile fibre obtained from sheep and other mammals, especially goats, rabbits, and camelids. The term may also refer to inorganic materials, such as mineral wool and glass wool, that have properties similar to animal wool. As an ...
, ''Mammuthus primigenius'' * Domestic cow, ''Bos taurus'' *
Bovine genome The genome of a female Hereford cattle, Hereford cow was published in 2009. It was sequenced by the Bovine Genome Sequencing and Analysis Consortium, a team of researchers led by the National Institutes of Health and the U.S. Department of Agricult ...
*
Honey Bee Genome Sequencing Consortium The Honey Bee Genome Sequencing Consortium is an international collaborative group of genomics scientists, scientific organisations and universities trying to decipher the genome sequences of the honey bee (''Apis mellifera''). It was formed in 200 ...
* Horse genome * Human microbiome project * International Grape Genome Program *
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
* Tomato 150+ genome resequencing project * 100,000 Genomes Project * 100K Pathogen Genome Project * International Mouse Phenotyping Consortium IMPC * Knockout Mouse Phenotyping Project KOMP2 * Giant Sequoia, ''Sequoiadendron giganteum''


See also

*
Joint Genome Institute The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), first located in Walnut Creek then Berkeley, California, was created in 1997 to unite the expertise and resources in genome mapping, DNA sequencing, technology development, and i ...
* Illumina, private company involved in genome sequencing * Knome, private company offering genome analysis & sequencing *
Model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
*
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. Th ...


References


External links


GOLD:Genomes OnLine DatabaseGenome Project DatabaseThe Protein Naming UtilitySUPERFAMILYEchinoBase
An Echinoderm genomic database, (previous SpBase, a sea urchin genome database)
NRCPB

Global Invertebrate Genomics Alliance (GIGA)

Wellcome Sanger Institute

Wellcome Genome Campus
{{DEFAULTSORT:Genome Project