Paleogenomics is a field of science based on the reconstruction and analysis of genomic information in extinct
species
In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...
. Improved methods for the extraction of
ancient DNA
Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Even under the bes ...
(aDNA) from museum artifacts, ice cores, archeological or paleontological sites, and
next-generation sequencing technologies have spurred this field. It is now possible to detect
genetic drift
Genetic drift, also known as allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.
Genetic drift may cause gene variants to disappear completely and there ...
, ancient population migration and interrelationships, the evolutionary history of extinct plant, animal and ''Homo'' species, and identification of phenotypic features across geographic regions. Scientists can also use paleogenomics to compare ancient ancestors against modern-day humans.
The rising importance of paleogenomics is evident from the fact that the 2022 Nobel Prize in physiology or medicine was awarded to a Swedish geneticist
Svante Pääbo 955- who worked on paleogenomics.
Background
Initially, aDNA sequencing involved cloning small fragments into bacteria, which proceeded with low efficiency due to the oxidative damage the aDNA suffered over millennia.
aDNA is difficult to analyze due to facile degradation by
nucleases; specific environments and postmortem conditions improved isolation and analysis. Extraction and
contamination
Contamination is the presence of a constituent, impurity, or some other undesirable element that spoils, corrupts, infects, makes unfit, or makes inferior a material, physical body, natural environment, workplace, etc.
Types of contamination ...
protocols were necessary for reliable analyses. With the development of the Polymerase Chain Reaction (
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
) in 1983, scientists could study DNA samples up to approximately 100,000 years old, a limitation of the relatively short isolated fragments. Through advances in isolation, amplification, sequencing, and data reconstruction, older and older samples have become analyzable. Over the past 30 years, high copy number
mitochondrial DNA was able to answer many questions; the advent of
NGS techniques prompted far more. Moreover, this technological revolution allowed the transition from
paleogenetics to paleogenomics.
Sequencing methods
Challenges and techniques
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
,
NGS second generation, and various library methods are available for sequencing aDNA, besides many
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
tools. When dealing with each of these methods it is important to consider that aDNA can be altered post-mortem.
Specific alterations arise from:
* Basis mutational patterns sequence data (C->T mutation)
* Crosslinks
* Cytosine deamination (increased towards read termini)
*
Depurination
* Genome fragmentation
Specific patterns and onset of these alterations help scientists to estimate the sample's age.
Formerly, scientists diagnosed post-mortem damages using enzymatic reactions or
gas chromatography associated with
mass spectroscopy; in more recent years scientists began to detect them by exploiting mutational sequence data. This strategy allows to identify excess of C->T mutations following treatment with
uracil
Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced ...
DNA glycosylase. Nowadays, one uses
high-throughput sequencing (HTS) to identify depurination (a process that drives post-mortem DNA fragmentation, younger samples present more
adenine than
guanine
Guanine () (symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
),
single strand breaks in double helix of DNA and abasic site (created by C->T mutation).
A single fragment of aDNA can be sequenced in its full length with HTS. With these data we can create a distribution representing a size decay curve that enables a direct quantitative comparison of fragmentation across specimens through space and environmental conditions. Throughout the decay curve it is possible to obtain the median length of the given fragment of aDNA. This length reflects the fragmentation levels after death, which generally increases with depositional temperature.
[Orlando L., Gilbert MT., Willerslev E. 2015]
Reconstructing ancient genomes and epigenomes
Nat. Rev. Genet. 16(7):395-408.
Libraries
Two different libraries can be performed for aDNA sequencing using
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
for
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
amplification:
* Double-stranded aDNA library (dsDNA library)
* Single-stranded aDNA library (ssDNA library)
The first one is created using the blunt-end approach. This technique uses two different adaptors: these adaptors bind randomly the fragment and it can then be amplified. The fragment that does not contain both adaptors cannot be amplified causing an error source. To reduce this error,
Illumina T/A ligation was introduced: this method consists in inserting the A tailing in DNA sample to facilitate the ligation of T tailed adaptors. In this methods we optimize the amplification of the aDNA.
To obtain ssDNA libraries, DNA is first
denatured with heat. The obtained ssDNA is then ligated to two adaptors in order to generate the
complementary strand and finally
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
is applied.
aDNA Enrichment
As aDNA may contain bacterial DNA or other microorganisms, the process requires enrichment. In order to separate endogenous and exogenous fractions, various methods are employed:
* Damaged template enrichment: Used when constructing an ssDNA library because this method targets DNA damage. When Bst polymerase fills the nick, the sample is treated with uracil DNA glycosylase and endonuclease VIII. These compounds attack the abasic site. The undamaged DNA remains attached to
streptavidin-coated
paramagnetic beads and can be separated from the sample. This method is specific for samples from late Pleistocene Neanderthals.
* Extension-free target enrichment in solution: this method is based on target-probe hybridization. This method requires DNA denaturation and then inserts overlapping tiled probes along target regions. Then, PCR for DNA amplification is used and finally DNA is linked to a
biotinylated adaptor. It's useful for samples from Archaic hominin ancestry.
* Solid-phase target enrichment: in this method
microarray
A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silic ...
and
real-time PCR method are used in parallel with
shotgun sequencing screening.
* Whole-genome enrichment: used for sequencing the entire genome of single individuals. Whole-genome In-Solution Capture (WISC) is used. This method starts with the preparation of a genome-wide RNA probe library from a species with a genome that is closely related to the target genome in the DNA sample.
Diversification of present-day non-African populations and anatomically modern humans
By now many studies in different fields have led to the conclusion that present-day non-African population is the result of the diversification in several different
lineages of an ancestral, well-structured,
metapopulation which was the protagonist of an out-of-Africa expansion, in which it carried a subset of African
genetic heritage
Genetic genealogy is the use of genealogical DNA tests, i.e., DNA profiling and DNA testing, in combination with traditional genealogical methods, to infer genetic relationships between individuals. This application of genetics came to be used b ...
. In this context, the analysis of ancient DNA was fundamental to test already formulated hypothesis and to provide new insights. First, it has allowed to narrow the timing and the structure of this diversification phenomenon by providing the calibration of the autosomal and mitochondrial
mutation rate.
[Skoglund P. and Mathieson I. 2018]
Ancient genomics of modern humans: the first decade
Annu. Rev. Genom. Hum. Genet. 19:1, 381-404. Admixture
Admixture may refer to:
* Genetic admixture, the result of interbreeding between two or more previously isolated populations within a species
* Racial admixture, admixture between humans, also referred to as miscegenation
* Hybrid
* Mixture, the ch ...
analysis has demonstrated that at least two independent
gene flow
In population genetics, gene flow (also known as gene migration or geneflow and allele flow) is the transfer of genetic material from one population to another. If the rate of gene flow is high enough, then two populations will have equivalen ...
events have occurred between ancestors of
modern humans and archaic humans, such as
Neanderthal
Neanderthals (, also ''Homo neanderthalensis'' and erroneously ''Homo sapiens neanderthalensis''), also written as Neandertals, are an extinct species or subspecies of archaic humans who lived in Eurasia until about 40,000 years ago. While ...
and
Denisovan populations, testifying the “leaky replacement” model of Eurasian human population history. According to all these data, the human divergence of the non-African lineages occurred around 45,000 – 55,000
BP.
Besides that, in many cases ancient DNA has allowed to track historical processes which have led, in time, to the actual population genetic structure, which would have been difficult to do counting only on the analysis of present-day genomes. Among these still unresolved questions, some of the most studied are the identity of the first inhabitants of the Americas, the peopling of Europe and the origin of agriculture in Europe.
[Lan T. and Lindqvist C. 2018]
Paleogenomics: Genome-Scale Analysis of Ancient DNA and Population and Evolutionary Genomic Inferences
In: Population Genomics, Springer, Cham. pp 1-38.
Phenotypic variation in humans
Analysis of
ancient DNA
Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Even under the bes ...
allows to study mutations of
phenotypic traits following changes in environment and human behavior. Migration to new habitats, new dietary shifts (following the transition to agriculture) and building of large communities led to the exposure of humans to new conditions that ultimately resulted in
biological adaptation
In biology, adaptation has three related meanings. Firstly, it is the dynamic evolutionary process of natural selection that fits organisms to their environment, enhancing their evolutionary fitness. Secondly, it is a state reached by the p ...
.
Skin colour
Migration of humans out of
Africa
Africa is the world's second-largest and second-most populous continent, after Asia in both cases. At about 30.3 million km2 (11.7 million square miles) including adjacent islands, it covers 6% of Earth's total surface area ...
to higher latitudes involved less exposure to sunlight. Since
UVA and
UVB rays are crucial for the synthesis of
vitamin D
Vitamin D is a group of fat-soluble secosteroids responsible for increasing intestinal absorption of calcium, magnesium, and phosphate, and many other biological effects. In humans, the most important compounds in this group are vitamin D3 ...
, which regulates calcium absorption and thus is essential for bone health, living at higher latitudes would mean a substantial reduction in
vitamin D
Vitamin D is a group of fat-soluble secosteroids responsible for increasing intestinal absorption of calcium, magnesium, and phosphate, and many other biological effects. In humans, the most important compounds in this group are vitamin D3 ...
synthesis. This put a new
selective pressure
Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
on skin colour trait, favouring lighter skin colour at higher latitudes.
The two most important genes involved in skin pigmentation are SLC24A5 and SLC45A2. Nowadays the “light skin” alleles of these genes are fixed in
Europe
Europe is a large peninsula conventionally considered a continent in its own right because of its great physical size and the weight of its history and traditions. Europe is also considered a subcontinent of Eurasia and it is located enti ...
but they reached a relatively high frequency only fairly recently (about 5000 years ago).
Such slow depigmentation process suggests that ancient Europeans could have faced the downsides of low vitamin D production, such as musculoskeletal and cardiovascular conditions. Another hypothesis is that pre-agricultural Europeans could have met their vitamin D requirements through their diet (since meat and fish contain some vitamin D)
[Marciniak S., Perry G. H]
Harnessing ancient genomes to study the history of human adaptation
Nature Reviews Genetics volume 18, pages 659–674 (2017)
Adaptation to agricultural diet
One of the major examples of adaptation following the switch to agricultural diet is the persistence of production of the
lactase
Lactase is an enzyme produced by many organisms. It is located in the brush border of the small intestine of humans and other mammals. Lactase is essential to the complete digestion of whole milk; it breaks down lactose, a sugar which gives ...
enzyme in adulthood. This enzyme is essential to digest
lactose
Lactose is a disaccharide sugar synthesized by galactose and glucose subunits and has the molecular formula C12H22O11. Lactose makes up around 2–8% of milk (by mass). The name comes from ' (gen. '), the Latin word for milk, plus the suffix ...
present in milk and dietary products and its absence leads to diarrhea following the consumption of these products.
Lactase
Lactase is an enzyme produced by many organisms. It is located in the brush border of the small intestine of humans and other mammals. Lactase is essential to the complete digestion of whole milk; it breaks down lactose, a sugar which gives ...
persistence is determined predominantly by a single-base mutation in the MCM6 gene and ancient DNA data show that this mutation became common only within the past 5000 years, thousands of years after the beginning of dairying practices.
Thus, even in the case of lactase-persistence there is a huge time delay between the onset of a new habit and the spread of the adaptive allele and so milk consumption may have been restricted to children or to lactose-reduced products.
Another example of mutation positively selected by the switch to agriculture is the number of AMY1 gene copies. AMY1 encodes for the starch-digesting enzyme
amylase
An amylase () is an enzyme that catalyses the hydrolysis of starch (Latin ') into sugars. Amylase is present in the saliva of humans and some other mammals, where it begins the chemical process of digestion. Foods that contain large amount ...
present in saliva and modern humans have a higher number of gene copies compared to
chimpanzees.
The immune system
The human
immune system
The immune system is a network of biological processes that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to parasitic worms, as well as Tumor immunology, cancer cells and objects such ...
has undergone intense selection through the millennia, adapting to different pathogen landscapes. Several environmental and cultural changes have imposed a
selective pressure
Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
on different immune-associated genes. Migrations, for example, exposed humans to new habitats carrying new pathogens or pathogen vectors (e.g. mosquitos). Also the switch to agriculture involved exposition to different pathogens and health conditions, both due to the increased population density and to living close to livestock.
However, it is difficult to directly correlate particular ancient genome changes to improved resistance to particular pathogens, giving the vastness and complexity of the human immune system.
Besides studying directly changes in the human immune system, it is also possible to study the ancient genomes of pathogens, such as those causing
tuberculosis
Tuberculosis (TB) is an infectious disease usually caused by ''Mycobacterium tuberculosis'' (MTB) bacteria. Tuberculosis generally affects the lungs, but it can also affect other parts of the body. Most infections show no symptoms, in w ...
,
leprosy
Leprosy, also known as Hansen's disease (HD), is a long-term infection by the bacteria '' Mycobacterium leprae'' or '' Mycobacterium lepromatosis''. Infection can lead to damage of the nerves, respiratory tract, skin, and eyes. This nerve da ...
,
plague,
smallpox
Smallpox was an infectious disease caused by variola virus (often called smallpox virus) which belongs to the genus Orthopoxvirus. The last naturally occurring case was diagnosed in October 1977, and the World Health Organization (WHO) ce ...
or
malaria
Malaria is a mosquito-borne infectious disease that affects humans and other animals. Malaria causes symptoms that typically include fever, tiredness, vomiting, and headaches. In severe cases, it can cause jaundice, seizures, coma, or deat ...
. For example, researchers have discovered that all strains of ''
Yersinia pestis
''Yersinia pestis'' (''Y. pestis''; formerly '' Pasteurella pestis'') is a gram-negative, non-motile, coccobacillus bacterium without spores that is related to both '' Yersinia pseudotuberculosis'' and '' Yersinia enterocolitica''. It is a facu ...
'' before 3600 years ago were lacking the ''ymt'' gene, which is essential for the pathogen to survive in the intestine of
fleas.
This suggests that in the ancient past plague may had been less virulent compared to more recent ''Y. pestis'' outbreaks.
Plants and animals
Many non-hominin
vertebrates - ancient
mammoth,
polar bear,
dog and
horse
The horse (''Equus ferus caballus'') is a domesticated, one-toed, hoofed mammal. It belongs to the taxonomic family Equidae and is one of two extant subspecies of ''Equus ferus''. The horse has evolved over the past 45 to 55 million ...
- have been reconstructed through aDNA recovery from
fossils
A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
and samples preserved at low temperature or high altitude. Mammoth studies are most frequent due to the high presence of soft tissue and hair from permafrost and are used to identify the relationship and demographic changes with more recent
elephants
Elephants are the largest existing land animals. Three living species are currently recognised: the African bush elephant, the African forest elephant, and the Asian elephant. They are the only surviving members of the family Elephantidae ...
. Polar bear studies are performed to identify the impact of
climate change
In common usage, climate change describes global warming—the ongoing increase in global average temperature—and its effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to ...
in
evolution
Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
and
biodiversity
Biodiversity or biological diversity is the variety and variability of life on Earth. Biodiversity is a measure of variation at the genetic ('' genetic variability''), species ('' species diversity''), and ecosystem ('' ecosystem diversity' ...
. Dog and horse studies give insights into
domestication
Domestication is a sustained multi-generational relationship in which humans assume a significant degree of control over the reproduction and care of another group of organisms to secure a more predictable supply of resources from that group. A ...
. In plants, aDNA has been isolated from
seeds
A seed is an Plant embryogenesis, embryonic plant enclosed in a testa (botany), protective outer covering, along with a food reserve. The formation of the seed is a part of the process of reproduction in seed plants, the spermatophytes, includ ...
,
pollen
Pollen is a powdery substance produced by seed plants. It consists of pollen grains (highly reduced microgametophytes), which produce male gametes (sperm cells). Pollen grains have a hard coat made of sporopollenin that protects the gametop ...
and
wood
Wood is a porous and fibrous structural tissue found in the stems and roots of trees and other woody plants. It is an organic materiala natural composite of cellulose fibers that are strong in tension and embedded in a matrix of ligni ...
. A correlation has been identified between ancient and extant
barley
Barley (''Hordeum vulgare''), a member of the grass family, is a major cereal grain grown in temperate climates globally. It was one of the first cultivated grains, particularly in Eurasia as early as 10,000 years ago. Globally 70% of barley ...
. Another application was the detection of domestication and adaptation process of
maize
Maize ( ; ''Zea mays'' subsp. ''mays'', from es, maíz after tnq, mahiz), also known as corn ( North American and Australian English), is a cereal grain first domesticated by indigenous peoples in southern Mexico about 10,000 years ago. ...
which include genes for
drought
A drought is defined as drier than normal conditions.Douville, H., K. Raghavan, J. Renwick, R.P. Allan, P.A. Arias, M. Barlow, R. Cerezo-Mota, A. Cherchi, T.Y. Gan, J. Gergis, D. Jiang, A. Khan, W. Pokam Mba, D. Rosenfeld, J. Tierney, an ...
tolerance and
sugar content.
Challenges and future perspectives
The analysis of ancient genomes of anatomically modern humans has, in recent years, completely revolutionized our way of studying population migrations, transformation and evolution. Nevertheless, much still remains unknown. The first and obvious problem related to this kind of approach, which is going to be partially overcome by the continuous improvement of the ancient DNA extraction techniques, is the difficulty of recovering well preserved ancient genomes, a challenge that is particularly observed in Africa and in Asia, where the temperatures are higher than in other colder regions of the world. Further, Africa is, among all the continents, the one that harbors the most
genetic diversity
Genetic diversity is the total number of Genetics, genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species. ...
.
Besides DNA degradation, also exogenous contamination limits paleogenomic sequencing and assembly processes.
As we do not possess ancient DNA coming from the time and the region inhabited by the original ancestors of present-day non-African population, we still know little about their structure and location. The second and more important challenge that this matter has to face is the recovery of DNA from early modern humans (100,000 – 200,000 BP). These data, together with a major number of archaic genomes to analyze and with the knowledge of the timing and of the distribution of archaic genetic admixture, will allow scientists to more easily reconstruct the history of our species. In fact, collecting more data about or genetic history will allow us to track human evolution not only in terms of migrations and
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Cha ...
, but also in terms of culture. In the next decade paleogenomics research field is going to focus its attention mainly on three topics: the definition, at a fine-scale detail, of past human interactions by denser sampling, the comprehension of how these interactions have contributed to agricultural transition by analysis of DNA of understudied regions and, finally, the quantification of the natural selection contribution to present-day phenotypes. To interpret all these data geneticists will be required to cooperate, as they have already done with
anthropologists and
archaeologists, with
historian
A historian is a person who studies and writes about the past and is regarded as an authority on it. Historians are concerned with the continuous, methodical narrative and research of past events as relating to the human race; as well as the st ...
s.
Bioethics
Bioethics
Bioethics is both a field of study and professional practice, interested in ethical issues related to health (primarily focused on the human, but also increasingly includes animal ethics), including those emerging from advances in biology, med ...
in paleogenomics concerns ethical questions that arise in the study of ancient human remains, due to the complex relationships among scientists, governments and indigenous
populations. In addition, paleogenomic studies have the potential to harm community or individual histories and identities, as well as to reveal denouncing information about their descendants. For these reasons, these kind of studies are still a touchy subject.
Paleogenomics studies can have negative consequences mainly because of the discrepancies between articulations of ethical principles and practices. In fact, ancestors’ remains are usually considered legally and scientifically as “artifacts”, rather than “human subjects”, which justifies questionable behaviors and lack of engagement from
communities. Testing of ancestral remains are therefore used in disputes, claims in treaty, repatriation, or other legal cases.
The acknowledgement of the importance and susceptibility of this subject is heading towards ethical commitment and guidance applicable to different contexts, in order to preserve ancestral remains’ dignity and avoid ethical issues.
Advancing the ethics of paleogenomics: Ancestral remains should not be regarded as "artifacts" but as human relatives who eserve respect
- Jessica Bardill, Alyssa C. Bader, Nanibaa' A. Garrison, Deborah A. Bolnick, Jennifer A. Raff, Alexa Walker, Ripan S. Malhi, and the Summer Internship for INdigenous peoples in Genomics (SING) Consortium Finally, another pioneering area of interest is the so-called “de-extinction” project, which aims to the resurrection of extinct species, such as the mammoth. This project, which appears to be possible thanks to the CRISPR/Cas9 technology, is, however, strongly connected to many ethical issues.
References
{{reflist
Ancient DNA (human)
DNA
Genetic genealogy
Genomics
Methods in archaeology
Paleogenetics