Pangenome
   HOME

TheInfoList



OR:

In the fields of
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
and
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
, a pan-genome (pangenome or supragenome) is the entire set of
genes In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
from all strains within a
clade In biology, a clade (), also known as a Monophyly, monophyletic group or natural group, is a group of organisms that is composed of a common ancestor and all of its descendants. Clades are the fundamental unit of cladistics, a modern approach t ...
. More generally, it is the union of all the genomes of a clade. The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain. Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes. Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment". The field of study of pangenomes is called pangenomics. The genetic repertoire of a bacterial species is much larger than the gene content of an individual strain. Some
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
have open (or extensive) pangenomes, while others have closed pangenomes. For species with a closed pan-genome, very few genes are added per sequenced genome (after sequencing many strains), and the size of the full pangenome can be theoretically predicted. Species with an open pangenome have enough genes added per additional sequenced genome that predicting the size of the full pangenome is impossible. Population size and
niche Niche may refer to: Science *Developmental niche, a concept for understanding the cultural context of child development and growth *Ecological niche, a term describing the relational position of an organism's species *Niche differentiation, in ec ...
versatility have been suggested as the most influential factors in determining pan-genome size. Pangenomes were originally constructed for species of
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
, but more recently
eukaryotic The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
pan-genomes have been developed, particularly for
plant Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
species. Plant studies have shown that pan-genome dynamics are linked to transposable elements. The significance of the pan-genome arises in an evolutionary context, especially with relevance to
metagenomics Metagenomics is the study of all genetics, genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the mic ...
, but is also used in a broader
genomics Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
context. An open access book reviewing the pangenome concept and its implications, edited by Tettelin and Medini, was published in the spring of 2020.


Etymology

The term 'pangenome' was defined with its current meaning by Tettelin et al. in 2005; it derives 'pan' from the Greek word παν, meaning 'whole' or 'everything', while the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
is a commonly used term to describe an organism's complete genetic material. Tettelin et al. applied the term specifically to
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
, whose pangenome "includes a core genome containing genes present in all strains and a dispensable genome composed of genes absent from one or more strains and genes that are unique to each strain."


Parts of the pangenome


Core

Is the part of the pangenome that is shared by every
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
in the tested set. Some authors have divided the core pangenome in hard core, those families of homologous genes that has at least one copy of the family shared by every genome (100% of genomes) and the soft core or extended core, those families distributed above a certain threshold (90%). In a study that involves the pangenomes of ''
Bacillus cereus ''Bacillus cereus'' is a Gram-positive bacteria, Gram-positive Bacillus, rod-shaped bacterium commonly found in soil, food, and marine sponges. The specific name, ''cereus'', meaning "waxy" in Latin, refers to the appearance of colonies grown o ...
'' and ''
Staphylococcus aureus ''Staphylococcus aureus'' is a Gram-positive spherically shaped bacterium, a member of the Bacillota, and is a usual member of the microbiota of the body, frequently found in the upper respiratory tract and on the skin. It is often posi ...
,'' some of them isolated from the international space station, the thresholds used for segmenting the pangenomes were as follows: "Cloud", "Shell", and "Core" corresponding to gene families with presence in <10%, 10–95%, and >95% of the genomes, respectively. The core genome size and proportion to the pangenome depends on several factors, but it is especially dependent on the phylogenetic similarity of the considered genomes. For example, the core of two identical genomes would also be the complete pangenome. The core of a genus will always be smaller than the core genome of a species. Genes that belong to the core genome are often related to house keeping functions and primary metabolism of the lineage, nevertheless, the core gene can also contain some genes that differentiate the species from other species of the genus, i.e. that may be related pathogenicity to niche adaptation.


Shell

Is the part of the pangenome shared by the majority of the genomes in a pangenome. There is not a universally accepted threshold to define the shell genome, some authors consider a gene family as part of the shell pangenome if it shared by more than 50% of the genomes in the pangenome. A family can be part of the shell by several evolutive dynamics, for example by gene loss in a lineage where it was previously part of the core genome, such is the case of enzymes in the
tryptophan Tryptophan (symbol Trp or W) is an α-amino acid that is used in the biosynthesis of proteins. Tryptophan contains an α-amino group, an α-carboxylic acid group, and a side chain indole, making it a polar molecule with a non-polar aromat ...
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
in ''
Actinomyces ''Actinomyces'' is a genus of the Actinomycetia class of bacteria. They all are Gram-positive and facultatively anaerobic, growing best under anaerobic conditions. ''Actinomyces'' species may form endospores, and while individual bacteria are r ...
'', or by gene gain and fixation of a gene family that was previously part of the dispensable genome such is the case of ''
trpF The tendency of the rate of profit to fall (TRPF) is a theory in the crisis theory of political economy, according to which the rate of profit—the ratio of the profit to the amount of invested capital—decreases over time. This hypothesis g ...
'' gene in several ''
Corynebacterium ''Corynebacterium'' () is a genus of Gram-positive bacteria and most are aerobic. They are bacilli (rod-shaped), and in some phases of life they are, more specifically, club-shaped, which inspired the genus name ('' coryneform'' means "club-s ...
'' species.


Cloud

The cloud genome consists of those gene families shared by a minimal subset of the genomes in the pangenome, it includes singletons or genes present in only one of the genomes. It is also known as the peripheral genome, or accessory genome. Gene families in this category are often related to ecological adaptation.


Classification

The pan-genome can be somewhat arbitrarily classified as open or closed based on the alpha value of
Heaps' law In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated ...
: N=kn^ * N Number of gene families. * n Number of genomes. * k Constant of proportionality. * \alpha Exponent calculated in order to adjust the curve of number of gene families vs new genome. if \alpha \le 1 then the pangenome is considered open. if \alpha > 1 then the pangenome is considered closed. Usually, the pangenome software can calculate the parameters of the Heap law that best describe the behavior of the data.


Open pangenome

An open pangenome occurs when the number of new gene families in one taxonomic lineage keeps increasing without appearing to be
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...
regardless how many new
genomes A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
are added to the pangenome. ''
Escherichia coli ''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Escherichia'' that is commonly fo ...
'' is an example of a species with an open pangenome. Any '' E. coli'' genome size is in the range of 4000–5000 genes and the pangenome size estimated for this species with approximately 2000 genomes is composed by 89,000 different gene families. The pangenome of the domain
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
is also considered to be open.


Closed Pangenome

A closed pangenome occurs in a lineage when only few gene families are added when new
genomes A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
are incorporated into the pangenome analysis, and the total amount of gene families in the pangenome seem to be
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...
to one number. It is believed that
parasitism Parasitism is a close relationship between species, where one organism, the parasite, lives (at least some of the time) on or inside another organism, the host, causing it some harm, and is adapted structurally to this way of life. The en ...
and species that are specialists in some ecological
niche Niche may refer to: Science *Developmental niche, a concept for understanding the cultural context of child development and growth *Ecological niche, a term describing the relational position of an organism's species *Niche differentiation, in ec ...
tend to have closed pangenomes. ''
Staphylococcus lugdunensis ''Staphylococcus lugdunensis'' is a coagulase-negative member of the genus ''Staphylococcus'', consisting of Gram-positive bacteria with spherical cells that appear in clusters. History It was first described in 1988 after being differentiate ...
'' is an example of a
commensal Commensalism is a long-term biological interaction (symbiosis) in which members of one species gain benefits while those of the other species neither benefit nor are harmed. This is in contrast with mutualism, in which both organisms benefit f ...
bacteria with closed pan-genome.


History


Pangenome

The original pangenome concept was developed by Tettelin et al. when they analyzed the
genomes A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
of eight isolates of ''
Streptococcus agalactiae ''Streptococcus agalactiae'' (also known as group B streptococcus or GBS) is a gram-positive coccus (round bacterium) with a tendency to form chains (as reflected by the genus name ''Streptococcus''). It is a beta-hemolytic, catalase-negative, an ...
,'' where they described a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Extrapolation suggested that the gene reservoir in the ''S. agalactiae'' pan-genome is vast and that new unique genes would continue to be identified even after sequencing hundreds of genomes. The pangenome comprises the entirety of the genes discovered in the sequenced genomes of a given microbial species and it can change when new genomes are sequenced and incorporated into the analysis. The pangenome of a genomic lineage accounts for the intra lineage gene content variability. Pangenome evolves due to: gene duplication, gene gain and loss dynamics and interaction of the genome with mobile elements that are shaped by selection and drift. Some studies point that prokaryotes pangenomes are the result of adaptive, not
neutral evolution The neutral theory of molecular evolution holds that most evolutionary changes occur at the molecular level, and most of the variation within and between species are due to random genetic drift of mutant alleles that are selectively neutral. The ...
that confer species the ability to migrate to new niches.


Supergenome

The supergenome can be thought of as the real pangenome size if all genomes from a species were sequenced. It is defined as all genes accessible for being gained by a certain species. It cannot be calculated directly but its size can be estimated by the pangenome size calculated from the available genome data. Estimating the size of the cloud genome can be troubling because of its dependence on the occurrence of rare genes and genomes. In 2011 genomic fluidity was proposed as a measure to categorize the gene-level similarity among groups of sequenced isolates. In some lineages the supergenomes did appear ''infinite'', as is the case of the Bacteria domain.


Metapangenome

'Metapangenome' has been defined as the outcome of the analysis of pangenomes in conjunction with the environment where the abundance and prevalence of gene clusters and genomes are recovered through shotgun metagenomes. The combination of metagenomes with pangenomes, also referred to as "metapangenomics", reveals the population-level results of habitat-specific filtering of the pangenomic gene pool. Other authors consider that Metapangenomics expands the concept of pangenome by incorporating
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
sequences obtained from uncultivated
microorganisms A microorganism, or microbe, is an organism of microscopic size, which may exist in its single-celled form or as a colony of cells. The possible existence of unseen microbial life was suspected from antiquity, with an early attestation in ...
by a
metagenomics Metagenomics is the study of all genetics, genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the mic ...
approach. A metapangenome comprises both sequences from metagenome-assembled genomes ( MAGs) and from
genomes A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
obtained from cultivated microorganisms. Metapangenomics has been applied to assess diversity of a community, microbial niche adaptation, microbial evolution, functional activities, and interaction networks of the community. The Anvi'o platform developed a workflow that integrates analysis and visualization of metapangenomes by generating pangenomes and study them in conjunction with metagenomes.


Examples


Prokaryote pangenome

In 2018, 87% of the available whole genome sequences were bacteria fueling researchers interest in calculating prokaryote pangenomes at different taxonomic levels. In 2015, the pangenome of 44 strains of ''
Streptococcus pneumoniae ''Streptococcus pneumoniae'', or pneumococcus, is a Gram-positive, spherical bacteria, hemolysis (microbiology), alpha-hemolytic member of the genus ''Streptococcus''. ''S. pneumoniae'' cells are usually found in pairs (diplococci) and do not f ...
'' bacteria shows few new genes discovered with each new genome sequenced (see figure). In fact, the predicted number of new genes dropped to zero when the number of genomes exceeds 50 (note, however, that this is not a pattern found in all species). This would mean that ''S. pneumoniae'' has a 'closed pangenome'. The main source of new genes in ''S. pneumoniae'' was ''
Streptococcus mitis ''Streptococcus mitis'' is a species of Gram-positive, mesophilic, alpha-hemolytic bacteria in the genus ''Streptococcus'', belonging to the viridans streptococci group. These bacteria are facultative anaerobes, and made up of non-motile and n ...
'' from which genes were transferred horizontally. The pan-genome size of ''S. pneumoniae'' increased logarithmically with the number of strains and linearly with the number of polymorphic sites of the sampled genomes, suggesting that acquired genes accumulate proportionately to the age of clones. Another example of prokaryote pan-genome is ''
Prochlorococcus ''Prochlorococcus'' is a genus of very small (0.6  μm) marine cyanobacteria with an unusual pigmentation ( chlorophyll ''a2'' and ''b2''). These bacteria belong to the photosynthetic picoplankton and are probably the most abundant photosyn ...
'', the core genome set is much smaller than the pangenome, which is used by different ecotypes of ''Prochlorococcus''. Open pan-genome has been observed in environmental isolates such as ''
Alcaligenes ''Alcaligenes'' is a genus of Gram-negative, aerobic, rod-shaped bacteria in the order of Burkholderiales, family Alcaligenaceae. History The type species, ''A. faecalis'', was first isolated from stale beer by Johannes Petruschky in 1896. H ...
'' sp. and ''
Serratia ''Serratia'' is a genus of Gram-negative, facultatively anaerobic, rod-shaped bacteria of the family Enterobacteriaceae. They are typically 1–5 μm in length, do not produce spores, and can be found in water, soil, plants, and animals. Some mem ...
'' sp., showing a sympatric lifestyle. Nevertheless, open pangenome is not exclusive to free living microorganisms, a 2015 study on ''
Prevotella ''Prevotella'' is a genus of Gram-negative bacteria. ''Prevotella'' species are widely distributed across varied ecological habitats, with 57 characterized species spanning both human and other mammalian hosts. In mammals, this genus is notabl ...
'' bacteria isolated from
human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
s, compared the gene repertoires of its species derived from different body sites of human. It also reported an open pan-genome showing vast diversity of gene pool. Archaea also have some pangenome studies.
Halobacteria Haloarchaea (halophilic archaea, halophilic archaebacteria, halobacteria) are a class (biology), class of prokaryotic archaea under the phylum Euryarchaeota, found in water Saturated and unsaturated compounds, saturated or nearly saturated with ...
pangenome shows the following gene families in the pangenome subsets: core (300), variable components (Softcore: 998, Cloud:36531, Shell:11784).


Eukaryote pangenome

Eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
organisms such as
fungi A fungus (: fungi , , , or ; or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified as one ...
,
animals Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia (). With few exceptions, animals consume organic material, breathe oxygen, have myocytes and are able to move, can reproduce sexually, and grow from a ...
and
plants Plants are the eukaryotes that form the kingdom Plantae; they are predominantly photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with cyanobacteria to produce sugars f ...
have also shown evidence of pangenomes. In four fungi species whose pangenome has been studied, between 80 and 90% of gene models were found as core genes. The remaining accessory genes were mainly involved in pathogenesis and antimicrobial resistance. In animals, the human pangenome is being studied. In 2010 a study estimated that a complete human pan-genome would contain ~19–40 Megabases of novel sequence not present in the extant reference
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
. Th
Human Pangenome consortium
has the goal to acknowledge the human genome diversity. In 2023, a draft human pangenome reference was published. It is based on 47 diploid genomes from persons of varied ethnicity. Plans are underway for an improved reference capturing still more biodiversity from a still wider sample. Among plants, there are examples of pangenome studies in model species, both diploid and polyploid, and a growing list of crops. Pangenomes have shown promise as a tool in plant breeding by accounting for
structural variants Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length ab ...
and
SNPs In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
in non-reference genomes, which helps to solve the problem of
missing heritability Missing or The Missing may refer to: Film * ''Missing'' (1918 film), an American silent drama directed by James Young * ''Missing'' (1982 film), an American historical drama directed by Costa-Gavras about the 1973 coup in Chile *, a Belgian film ...
that persists in genome wide association studies. An emerging plant-based concept is that of pan-NLRome, which is the repertoire of nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and confer disease resistance.


Virus pangenome

Virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
does not necessarily have genes extensively shared by clades such as is the case of 16S in
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
, and therefore the core genome of the full Virus Domain is empty. Nevertheless, several studies have calculated the pangenome of some viral lineages. The core genome from six species of pandoraviruses comprises 352 gene families only 4.7% of the pangenome, resulting in an open pangenome.


Data structures

The number of sequenced genomes is continuously growing "simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets". Pan-genome graph constructions are emerging data structure technique designed to represent pangenomes and to efficiently map reads to them. They have been reviewed by Eizenga et al.


Software tools

As interest in pangenomes increased, there have been several
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
tools developed to help analyze this kind of data. To start a pangenomic analysis the first step is the homogenization of genome annotation. The same software should be used to annotate all genomes used, such as GeneMark or RAST. In 2015, a group reviewed the different kinds of analyses and tools a researcher may have available. There are seven kinds of software developed to analyze pangenomes: Those dedicated to cluster homologous genes; identify
SNPs In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
; plot pangenomic profiles; build phylogenetic relationships of orthologous genes/families of strains/isolates; function-based searching; annotation and/or curation; and visualization. The two most cited software tools for pangenomic analysis at the end of 2014 were Panseq and the pan-genomes analysis pipeline (PGAP). Other options include BPGA – A Pan-Genome Analysis Pipeline for prokaryotic genomes, GET_HOMOLOGUES, Roary. and PanDelos. In 2015 a review focused on prokaryote pangenomes and another for plant pan-genomes were published. Among the first software packages designed for plant pangenomes were PanTools. and GET_HOMOLOGUES-EST. In 2018 panX was released, an interactive web tool that allows inspection of gene families evolutionary history. panX can display an alignment of genomes, a phylogenetic tree, mapping of mutations and inference about gain and loss of the family on the core-genome phylogeny. In 2019 OrthoVenn 2.0 allowed comparative visualization of families of homologous genes in Venn diagrams up to 12 genomes. In 2023
BRIDGEcereal
as developed to survey and graph indel-based haplotypes from pan-genome through a gene model ID. In 2020 Anvi'o was available as a multiomics platform that contains pangenomic and metapangenomic analyses as well as visualization workflows. In Anvi'o, genomes are displayed in concentrical circles and each radius represents a gene family, allowing for comparison of more than 100 genomes in its interactive visualization. In 2020, a computational comparison of tools for extracting gene-based pangenomic contents (such as GET_HOMOLOGUES, PanDelos, Roary, and others) has been released. Tools were compared from a methodological perspective, analyzing the causes that lead a given methodology to outperform other tools. The analysis was performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. Results show a differentiation of the performance of each tool that depends on the composition of the input genomes. Again in 2020, several tools introduced a graphical representation of the pangenomes showing the contiguity of genes (PPanGGOLiN, Panaroo). Other software tools for pangenomics include Prodigal, Prokka, PanVis, PanTools, Pangenome Graph Builder (PGGB), PanX, Pagoo, and pgr-tk.


See also

*
Metagenomics Metagenomics is the study of all genetics, genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the mic ...
*
Pathogenomics Pathogenomics is a field which uses high-throughput screening technology and bioinformatics to study encoded microbe resistance, as well as virulence factors (VFs), which enable a microorganism to infect a host and possibly cause disease. This inclu ...
*
Quasispecies The quasispecies model is a description of the process of the Darwinian evolution of certain self-replicating entities within the framework of physical chemistry. A quasispecies is a large group or "cloud" of related genotypes that exist in an env ...
* Human Pangenome Reference * Pan-genome graph construction


References

{{Genomics Evolutionary biology Genomics Microbiology Pathogen genomics