HOME

TheInfoList



OR:

Genome projects are
scientific Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence ...
endeavours that ultimately aim to determine the complete
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
sequence of an
organism In biology, an organism () is any life, living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy (biology), taxonomy into groups such as Multicellular o ...
(be it an
animal Animals are multicellular, eukaryotic organisms in the Kingdom (biology), biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motilit ...
, a
plant Plants are predominantly Photosynthesis, photosynthetic eukaryotes of the Kingdom (biology), kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all curr ...
, a
fungus A fungus (plural, : fungi or funguses) is any member of the group of Eukaryote, eukaryotic organisms that includes microorganisms such as yeasts and Mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified ...
, a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were a ...
, an archaean, a
protist A protist () is any eukaryotic organism (that is, an organism whose cells contain a cell nucleus) that is not an animal, plant, or fungus. While it is likely that protists share a common ancestor (the last eukaryotic common ancestor), the e ...
or a
virus A virus is a wikt:submicroscopic, submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and ...
) and to annotate protein-coding
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
in the organism. For a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were a ...
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences. The
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
is a well known example of a genome project.


Genome assembly

Genome assembly refers to the process of taking a large number of short
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
s and reassembling them to create a representation of the original
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
s from which the DNA originated. In a
shotgun sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing") ...
project, all the DNA from a source (usually a single
organism In biology, an organism () is any life, living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy (biology), taxonomy into groups such as Multicellular o ...
, anything from a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were a ...
to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult
computational Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as '' computers''. An esp ...
problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of
plant Plants are predominantly Photosynthesis, photosynthetic eukaryotes of the Kingdom (biology), kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all curr ...
s and
animal Animals are multicellular, eukaryotic organisms in the Kingdom (biology), biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motilit ...
s. The resulting (draft) genome sequence is produced by combining the information sequenced
contig A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005. In bottom-up sequencing projects, a contig refers to ov ...
s and then employing linking information to create scaffolds. Scaffolds are positioned along the
physical map A map is a symbolic depiction emphasizing relationships between elements of some space, such as Physical body, objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or intera ...
of the chromosomes creating a "golden path".


Assembly software

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of A ...
''Short Oligonucleotide Analysis Package'' developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.


Genome annotation

Since the 1980s,
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and phys ...
and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
have created the need for
DNA annotation DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanati ...
. DNA annotation or genome annotation is the process of identifying attaching biological information to
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
, and particularly in identifying the locations of genes and determining what those genes do.


Time of completion

When
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succi ...
a genome, there are usually regions that are difficult to sequence (often regions with highly
repetitive DNA Repeated sequences (also known as repetitive elements, repeating units or repeats) are short or long patterns of nucleic acids (DNA or RNA) that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genom ...
). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every base pair of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants)
chloroplasts A chloroplast () is a type of membrane-bound organelle known as a plastid that conducts photosynthesis mostly in plant and algal cells. The photosynthetic pigment chlorophyll captures the energy from sunlight, converts it, and stores it i ...
as these
organelles In cell biology, an organelle is a specialized subunit, usually within a cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as organs are to the body, hence ''organelle,'' th ...
have their own genomes. It is often reported that the goal of sequencing a genome is to obtain information about the complete set of
genes In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
such as humans, where
coding DNA The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the
coding region The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
s separately. Also, as scientists understand more about the role of this
noncoding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
(often referred to as
junk DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regu ...
), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functi ...
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
s to help find out where the genes actually are.


Historical and technological perspectives

Historically, when sequencing eukaryotic genomes (such as the worm ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (r ...
'') it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be ' shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach). Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly. When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism or have a relevance to human health (e.g. pathogenic
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
or vectors of disease such as
mosquito Mosquitoes (or mosquitos) are members of a group of almost 3,600 species of small flies within the family Culicidae (from the Latin ''culex'' meaning "gnat"). The word "mosquito" (formed by ''mosca'' and diminutive ''-ito'') is Spanish for "lit ...
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in
molecular evolution Molecular evolution is the process of change in the sequence composition of cell (biology), cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and popula ...
(e.g. the
common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative th ...
). In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.


Examples

Many organisms have genome projects that have either been completed or will be completed shortly, including: *
Human Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
s, ''Homo sapiens''; see
Human genome project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
* Humans, ''Homo sapiens''; see The Human Genome Project–Write *
Palaeo-Eskimo The Paleo-Eskimo (also pre-Thule or pre-Inuit) were the peoples who inhabited the Arctic region from Chukotka (e.g., Chertov Ovrag) in present-day Russia across North America to Greenland prior to the arrival of the modern Inuit (Eskimo) and re ...
, an ancient-human *
Neanderthal Neanderthals (, also ''Homo neanderthalensis'' and erroneously ''Homo sapiens neanderthalensis''), also written as Neandertals, are an extinct species or subspecies of archaic humans who lived in Eurasia until about 40,000 years ago. While ...
, ''Homo sapiens neanderthalensis'' (partial); see
Neanderthal Genome Project The Neanderthal genome project is an effort of a group of scientists to sequence the Neanderthal genome, founded in July 2006. It was initiated by 454 Life Sciences, a biotechnology company based in Branford, Connecticut in the United States and ...
*
Common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative th ...
''Pan troglodytes''; see
Chimpanzee Genome Project The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Proj ...
*
Wooly mammoth Wool is the textile fibre obtained from sheep and other mammal, mammals, especially goat, goats, rabbit, rabbits, and camelid, camelids. The term may also refer to inorganic materials, such as mineral wool and glass wool, that have properties ...
, ''Mammuthus primigenius'' * Domestic
cow Cattle (''Bos taurus'') are large, domesticated, cloven-hooved, herbivores. They are a prominent modern member of the subfamily Bovinae and the most widespread species of the genus '' Bos''. Adult females are referred to as cows and adult ...
, ''Bos taurus'' *
Bovine genome The genome of a female Hereford cow was published in 2009. It was sequenced by the Bovine Genome Sequencing and Analysis Consortium, a team of researchers led by the National Institutes of Health and the U.S. Department of Agriculture. It was par ...
*
Honey Bee Genome Sequencing Consortium The Honey Bee Genome Sequencing Consortium is an international collaborative group of genomics scientists, scientific organisations and universities trying to decipher the genome sequences of the honey bee (''Apis mellifera''). It was formed in 200 ...
* Horse genome *
Human microbiome project The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on id ...
*
International Grape Genome Program The International Grape Genomics Program (IGGP) is a collaborative genome project dedicated to determining the genome sequence of the grapevine ''Vitis vinifera''. It is a multinational project involving research centers in Australia, Canada, Chi ...
*
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
* Tomato 150+ genome resequencing project *
100,000 Genomes Project The 100,000 Genomes Project is a now-completed UK Government project managed by Genomics England that is sequencing whole genomes from National Health Service patients. The project is focusing on rare diseases, some common types of cancer, and ...
*
100K Pathogen Genome Project The 100K Pathogen Genome Project was launched in July 2012 by Bart Weimer (UC Davis) as an academic, public, and private partnership. It aims to sequence the genomes of 100,000 infectious microorganisms to create a database of bacterial genome seq ...
* International Mouse Phenotyping Consortium IMPC * Knockout Mouse Phenotyping Project KOMP2 *
Giant Sequoia ''Sequoiadendron giganteum'' (giant sequoia; also known as giant redwood, Sierra redwood, Sierran redwood, California big tree, Wellingtonia or simply big treea nickname also used by John Muir) is the sole living species in the genus ''Sequoiade ...
, ''Sequoiadendron giganteum''


See also

*
Joint Genome Institute The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), first located in Walnut Creek then Berkeley, California, was created in 1997 to unite the expertise and resources in genome mapping, DNA sequencing, technology development, and ...
* Illumina, private company involved in genome sequencing *
Knome Knome, Inc. was a human genome interpretation company based in Cambridge, Massachusetts. Launched in 2007, Knome focused on improving quality of life by applying scientific insights gained from the interpretation of human genomes. Their product ...
, private company offering genome analysis & sequencing * Model organism *
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. T ...


References


External links


GOLD:Genomes OnLine DatabaseGenome Project DatabaseThe Protein Naming UtilitySUPERFAMILYEchinoBase
An Echinoderm genomic database, (previous SpBase, a sea urchin genome database)
NRCPB

Global Invertebrate Genomics Alliance (GIGA)

Wellcome Sanger Institute

Wellcome Genome Campus
{{DEFAULTSORT:Genome Project