HOME

TheInfoList



OR:

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal ''
Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
''. In 2012, the sequencing of 1092 genomes was announced in a ''Nature'' publication. In 2015, two papers in ''Nature'' reported results and the completion of the project and opportunities for future research. Many rare variations, restricted to closely related groups, were identified, and eight structural-variation classes were analyzed. The project united multidisciplinary research teams from institutes around the world, including
China China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
,
Italy Italy, officially the Italian Republic, is a country in Southern Europe, Southern and Western Europe, Western Europe. It consists of Italian Peninsula, a peninsula that extends into the Mediterranean Sea, with the Alps on its northern land b ...
,
Japan Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
,
Kenya Kenya, officially the Republic of Kenya, is a country located in East Africa. With an estimated population of more than 52.4 million as of mid-2024, Kenya is the 27th-most-populous country in the world and the 7th most populous in Africa. ...
,
Nigeria Nigeria, officially the Federal Republic of Nigeria, is a country in West Africa. It is situated between the Sahel to the north and the Gulf of Guinea in the Atlantic Ocean to the south. It covers an area of . With Demographics of Nigeria, ...
,
Peru Peru, officially the Republic of Peru, is a country in western South America. It is bordered in the north by Ecuador and Colombia, in the east by Brazil, in the southeast by Bolivia, in the south by Chile, and in the south and west by the Pac ...
, the
United Kingdom The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the coast of European mainland, the continental mainland. It comprises England, Scotlan ...
, and the
United States The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...
contributing to the sequence dataset and to a refined human genome map freely accessible through public databases to the scientific community and the general public alike. The International Genome Sample Resource was created to host and expand on the data set after the project's end. __TOC__


Background

Since the completion of the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
advances in human
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
and
comparative genomics Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach c ...
enabled further insight into genetic diversity. The understanding about structural variations (insertions/deletions (
indel Indel (insertion-deletion) is a molecular biology term for an insertion or deletion of bases in the genome of an organism. Indels ≥ 50 bases in length are classified as structural variants. In coding regions of the genome, unless the lengt ...
s),
copy number variations Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of G ...
(CNV), retroelements),
single-nucleotide polymorphism In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a ...
s (SNPs), and
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
were being solidified.JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004) The diversity of Human genetic variation such as that
Indels Indel (insertion-deletion) is a molecular biology term for an insertion or deletion of bases in the genome of an organism. Indels ≥ 50 bases in length are classified as structural variants. In coding regions of the genome, unless the lengt ...
were being uncovered and investigating human genomic variations


Natural selection

It also aimed to provide evidence that can be used to explore the impact of
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
on population differences. Patterns of DNA polymorphisms can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism.EE Harris et al., The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006) Such insights could improve understanding of phenotypic variations,
genetic disorder A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosome abnormality. Although polygenic disorders ...
s and
Mendelian inheritance Mendelian inheritance (also known as Mendelism) is a type of biological inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, and later popularize ...
and their effects on survival and/or reproduction of different human populations.


Project description


Goals

The 1000 Genomes Project was designed to bridge the gap of knowledge between rare genetic variants that have a severe effect predominantly on simple traits (e.g.
cystic fibrosis Cystic fibrosis (CF) is a genetic disorder inherited in an autosomal recessive manner that impairs the normal clearance of Sputum, mucus from the lungs, which facilitates the colonization and infection of the lungs by bacteria, notably ''Staphy ...
, Huntington disease) and common genetic variants have a mild effect and are implicated in complex traits (e.g.
cognition Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...
,
diabetes Diabetes mellitus, commonly known as diabetes, is a group of common endocrine diseases characterized by sustained high blood sugar levels. Diabetes is due to either the pancreas not producing enough of the hormone insulin, or the cells of th ...
,
heart disease Cardiovascular disease (CVD) is any disease involving the heart or blood vessels. CVDs constitute a class of diseases that includes: coronary artery diseases (e.g. angina pectoris, angina, myocardial infarction, heart attack), heart failure, ...
).G Spencer, International Consortium Announces the 1000 Genomes Project, EMBARGOED (2008) http://www.nih.gov/news/health/jan2008/nhgri-22.htm The primary goal of this project was to create a complete and detailed catalogue of human genetic variations, which can be used for association studies relating genetic variation to disease. The consortium aimed to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies,
haplotype A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
backgrounds and
linkage disequilibrium Linkage disequilibrium, often abbreviated to LD, is a term in population genetics referring to the association of genes, usually linked genes, in a population. It has become an important tool in medical genetics and other fields In defining LD, it ...
patterns of variant alleles.Meeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007) http://www.1000genomes.org/sites/1000genomes.org/files/docs/1000Genomes-MeetingReport.pdf Secondary goals included the support of better SNP and probe selection for genotyping platforms in future studies and the improvement of the human reference sequence. The completed database was expected be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination.


Outline

The
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
coding
genes In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage. Over the course of the next three years, scientists at the
Sanger Institute The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit organisation, non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust. It is l ...
, BGI Shenzhen and the
National Human Genome Research Institute The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland. NHGRI began as the Office of Human Genome Research in The Office of the Director in 1988. This Office transi ...
’s Large-Scale Sequencing Network planned to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that was required, recruiting additional participants was maintained. Almost 10 billion bases were to be sequenced per day over a period of the two year production phase, equating to more than two human genomes every 24 hours. The intended sequence dataset was to comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published in
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
databases at the time. To determine the final design of the full project three pilot studies were to be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3 major geographic groups at low coverage (2×). For the second pilot study, the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20× per genome). The third pilot study involves sequencing the coding regions (
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
s) of 1,000 genes in 1,000 people with deep coverage (20×). It was estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Several newer technologies (e.g. Solexa, 454,
SOLiD Solid is a state of matter where molecules are closely packed and can not slide past each other. Solids resist compression, expansion, or external forces that would alter its shape, with the degree to which they are resisted dependent upon the ...
) were to be applied, lowering the expected costs to between $30 million and $50 million. The major support was provided by the
Wellcome Trust Sanger Institute The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit organisation, non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust. It is l ...
in Hinxton, England; the
Beijing Genomics Institute BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian, Shenzhen, Yantian, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Pro ...
, Shenzhen (BGI Shenzhen), China; and the NHGRI, part of the National Institutes of Health (NIH). In keeping with Fort Lauderdale principles all genome sequence data (including variant calls) is freely available as the project progresses and can be downloaded via ftp from the 1000 genomes project webpage.1000 genomes project data webpage
/ref>


Human genome samples

Based on the overall goals for the project, the samples will be chosen to provide power in populations where association studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation. For the pilot studies human genome samples from the HapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such as
ENCODE The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims "to build a comprehensive parts list of functional elements in the human genome." ENCODE also supports further biomedical research by "generating community resourc ...
sequence, genome-wide genotypes, fosmid-end sequence, structural variation assays, and
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
) to be able to compare the results with those from other projects. Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: Yoruba in
Ibadan Ibadan (, ; ) is the Capital city, capital and most populous city of Oyo State, in Nigeria. It is the List of Nigerian cities by population, third-largest city by population in Nigeria after Lagos and Kano (city), Kano, with a total populatio ...
(YRI),
Nigeria Nigeria, officially the Federal Republic of Nigeria, is a country in West Africa. It is situated between the Sahel to the north and the Gulf of Guinea in the Atlantic Ocean to the south. It covers an area of . With Demographics of Nigeria, ...
; Japanese in
Tokyo Tokyo, officially the Tokyo Metropolis, is the capital of Japan, capital and List of cities in Japan, most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is List of largest cities, one of the most ...
(JPT); Chinese in
Beijing Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
(CHB);
Utah Utah is a landlocked state in the Mountain states, Mountain West subregion of the Western United States. It is one of the Four Corners states, sharing a border with Arizona, Colorado, and New Mexico. It also borders Wyoming to the northea ...
residents with ancestry from northern and western
Europe Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. It is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east ...
(CEU);
Luhya Luhya or Abaluyia may refer to: * Luhya people * Luhya language {{disambig Language and nationality disambiguation pages ...
in Webuye,
Kenya Kenya, officially the Republic of Kenya, is a country located in East Africa. With an estimated population of more than 52.4 million as of mid-2024, Kenya is the 27th-most-populous country in the world and the 7th most populous in Africa. ...
(LWK); Maasai in Kinyawa, Kenya (MKK); Toscani in
Italy Italy, officially the Italian Republic, is a country in Southern Europe, Southern and Western Europe, Western Europe. It consists of Italian Peninsula, a peninsula that extends into the Mediterranean Sea, with the Alps on its northern land b ...
(TSI); Peruvians in
Lima Lima ( ; ), founded in 1535 as the Ciudad de los Reyes (, Spanish for "City of Biblical Magi, Kings"), is the capital and largest city of Peru. It is located in the valleys of the Chillón River, Chillón, Rímac River, Rímac and Lurín Rive ...
,
Peru Peru, officially the Republic of Peru, is a country in western South America. It is bordered in the north by Ecuador and Colombia, in the east by Brazil, in the southeast by Bolivia, in the south by Chile, and in the south and west by the Pac ...
(PEL); Gujarati Indians in
Houston Houston ( ) is the List of cities in Texas by population, most populous city in the U.S. state of Texas and in the Southern United States. Located in Southeast Texas near Galveston Bay and the Gulf of Mexico, it is the county seat, seat of ...
(GIH); Chinese in metropolitan
Denver Denver ( ) is a List of municipalities in Colorado#Consolidated city and county, consolidated city and county, the List of capitals in the United States, capital and List of municipalities in Colorado, most populous city of the U.S. state of ...
(CHD); people of Mexican ancestry in
Los Angeles Los Angeles, often referred to by its initials L.A., is the List of municipalities in California, most populous city in the U.S. state of California, and the commercial, Financial District, Los Angeles, financial, and Culture of Los Angeles, ...
(MXL); and people of African ancestry in the southwestern
United States The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...
(ASW). * Population that was collected in diaspora


Community meeting

Data generated by the 1000 Genomes Project is widely used by the genetics community, making the first 1000 Genomes Project one of the most cited papers in biology.C. King (2012) The Hottest Research of 2011. ''Science Watch'' http://archive.sciencewatch.com/newsletter/2012/201203/hottest_research_2012/ To support this user community, the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries, their impact on population genetics and human disease studies, and summaries of other large-scale sequencing studies.1000 Genomes Project Community Analysis Meeting http://1000gconference.sph.umich.edu/


Project findings


Pilot phase

The pilot phase consisted of three projects: * low-coverage whole-genome sequencing of 179 individuals from 4 populations * high-coverage sequencing of 2 trios (mother-father-child) * exon-targeted sequencing of 697 individuals from 7 populations It was found that on average, each person carries around 250–300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10−8 per base per generation.


See also

*
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
* HapMap Project * Personal genomics * Population groups in biomedicine * 1000 Plant Genomes Project * List of biological databases


References


External links


1000 Genomes
- A Deep Catalog of Human Genetic Variation - official web page
International HapMap Project
- official web page
Human Genome Project Information
{{DEFAULTSORT:1000 Genomes Project, The Human genome projects Population genetics organizations Single-nucleotide polymorphisms Genome projects Genomics Bioinformatics