HOME

TheInfoList



OR:

A gene family is a set of several similar genes, formed by duplication of a single original
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
, and generally with similar biochemical functions. One such family are the genes for human
hemoglobin Hemoglobin (haemoglobin, Hb or Hgb) is a protein containing iron that facilitates the transportation of oxygen in red blood cells. Almost all vertebrates contain hemoglobin, with the sole exception of the fish family Channichthyidae. Hemoglobin ...
subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two
gene cluster A gene cluster is a group of two or more genes found within an organism's DNA that encode similar peptide, polypeptides or proteins which collectively share a generalized function and are often located within a few thousand base pairs of each othe ...
s are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago. Genes are categorized into families based on shared
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
or protein sequences.
Phylogenetic In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
techniques can be used as a more rigorous test. The positions of
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
s within the coding sequence can be used to infer common ancestry. Knowing the sequence of the
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more information than similarities or differences among
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences. If the genes of a gene family encode proteins, the term ''
protein family A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...
'' is often used in an analogous manner to ''gene family''. The expansion or contraction of gene families along a specific lineage can be due to chance, or can be the result of natural selection. To distinguish between these two cases is often difficult in practice. Recent work uses a combination of statistical models and algorithmic techniques to detect gene families that are under the effect of natural selection. The
HUGO Gene Nomenclature Committee The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
(HGNC) creates nomenclature schemes using a "stem" (or "root") symbol for members of a gene family (by homology ''or'' function), with a hierarchical numbering system to distinguish the individual members. For example, for the peroxiredoxin family, ''PRDX'' is the root symbol, and the family members are '' PRDX1'', '' PRDX2'', '' PRDX3'', '' PRDX4'', '' PRDX5'', and '' PRDX6''.


Basic structure

One level of genome organization is the grouping of genes into several gene families. Gene families are groups of related genes that share a common ancestor. Members of gene families may be paralogs or orthologs. Gene paralogs are genes with similar sequences from within the same species while gene orthologs are genes with similar sequences in different species. Gene families are highly variable in size, sequence diversity, and arrangement. Depending on the diversity and functions of the genes within the family, families can be classified as multigene families or superfamilies. Multigene families typically consist of members with similar sequences and functions, though a high degree of divergence (at the sequence and/or functional level) does not lead to the removal of a gene from a gene family. Individual genes in the family may be arranged close together on the same chromosome or dispersed throughout the genome on different chromosomes. Due to the similarity of their sequences and their overlapping functions, individual genes in the family often share regulatory control elements. In some instances, gene members have identical (or nearly identical) sequences. Such families allow for massive amounts of gene product to be expressed in a short time as needed. Other families allow for similar but specific products to be expressed in different cell types or at different stages of an organism's development. Superfamilies are much larger than single multigene families. Superfamilies contain up to hundreds of genes, including multiple multigene families as well as single, individual gene members. The large number of members allows superfamilies to be widely dispersed with some genes clustered and some spread far apart. The genes are diverse in sequence and function displaying various levels of expression and separate regulation controls. Some gene families also contain
pseudogene Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudogenes arise as superfluous copies of fun ...
s, sequences of DNA that closely resemble established gene sequences but are non-functional. Different types of pseudogenes exist. Non-processed pseudogenes are genes that acquired mutations over time becoming non-functional. Processed pseudogenes are genes that have lost their function after being moved around the genome by retrotransposition. Pseudogenes that have become isolated from the gene family they originated in, are referred to as ''orphans''.


Formation

Gene families arose from multiple duplications of an ancestral gene, followed by mutation and divergence. Duplications can occur within a lineage (e.g., humans might have two copies of a gene that is found only once in chimpanzees) or they are the result of speciation. For example, a single gene in the ancestor of humans and chimpanzees now occurs in both species and can be thought of as having been 'duplicated' via speciation. As a result of duplication by speciation, a gene family might include 15 genes, one copy in each of 15 different species.


Duplication

In the formation of gene families, four levels of duplication exist: 1) exon duplication and
shuffling Shuffling is a technique used to randomize a deck of playing cards, introducing an element of chance into card games. Various shuffling methods exist, each with its own characteristics and potential for manipulation. One of the simplest shuf ...
, 2) entire
gene duplication Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene ...
, 3) multigene family duplication, and 4) whole genome duplication. Exon duplication and shuffling gives rise to variation and new genes. Genes are then duplicated to form multigene families which duplicate to form superfamilies spanning multiple chromosomes. Whole genome duplication doubles the number of copies of every gene and gene family. Whole genome duplication or
polyploid Polyploidy is a condition in which the biological cell, cells of an organism have more than two paired sets of (Homologous chromosome, homologous) chromosomes. Most species whose cells have Cell nucleus, nuclei (eukaryotes) are diploid, meaning ...
ization can be either autopolyploidization or alloploidization. Autopolyploidization is the duplication of the same genome and allopolyploidization is the duplication of two closely related genomes or hybridized genomes from different species. Duplication occurs primarily through uneven crossing over events in meiosis of germ cells. (1,2) When two chromosomes misalign, crossing over - the exchange of gene alleles - results in one chromosome expanding or increasing in gene number and the other contracting or decreasing in gene number. The expansion of a gene cluster is the duplication of genes that leads to larger gene families.


Relocation

Gene members of a multigene family or multigene families within superfamilies exist on different chromosomes due to relocation of those genes after duplication of the ancestral gene. Transposable elements play a role in the movement of genes. Transposable elements are recognized by inverted repeats at their 5' and 3' ends. When two transposable elements are close enough in the same region on a chromosome, they can form a composite transposon. The protein transposase recognizes the outermost inverted repeats, cutting the DNA segment. Any genes between the two transposable elements are relocated as the composite transposon jumps to a new area of the genome. Reverse transcription is another method of gene movement. An mRNA transcript of a gene is reversed transcribed, or copied, back into DNA. This new DNA copy of the mRNA is integrated into another part of the genome, resulting in gene family members being dispersed. A special type of multigene family is implicated in the movement of gene families and gene family members. LINE (Long INterspersed Elements) and SINE (Short INterspersed Elements) families are highly repetitive DNA sequences spread all throughout the genome. The LINEs contain a sequence that encodes a reverse transcriptase protein. This protein aids in copying the RNA transcripts of LINEs and SINEs back into DNA, and integrates them into different areas of the genome. This self-perpetuates the growth of LINE and SINE families. Due to the highly repetitive nature of these elements, LINEs and SINEs when close together also trigger unequal crossing over events which result in single-gene duplications and the formation of gene families.


Divergence

Non-synonymous mutations resulting in the substitution of amino acids, increase in duplicate gene copies. Duplication gives rise to multiple copies of the same gene, giving a level of redundancy where mutations are tolerated. With one functioning copy of the gene, other copies are able to acquire mutations without being extremely detrimental to the organisms. Mutations allow duplicate genes to acquire new or different functions.


Concerted evolution

Some multigene families are extremely homogenous, with individual genes members sharing identical or almost identical sequences. The process by which gene families maintain high homogeneity is Concerted evolution. Concerted evolution occurs through repeated cycles of unequal crossing over events and repeated cycles of gene transfer and conversion. Unequal crossing over leads to the expansion and contraction of gene families. Gene families have an optimal size range that natural selection acts towards. Contraction deletes divergent gene copies and keeps gene families from becoming too large. Expansion replaces lost gene copies and prevents gene families from becoming too small. Repeat cycles of gene transfer and conversion increasingly make gene family members more similar. In the process of gene transfer, allelic gene conversion is biased. Mutant alleles spreading in a gene family towards homogeneity is the same process of an advantageous allele spreading in a population towards fixation. Gene conversion also aids in creating genetic variation in some cases.


Evolution

Gene families, part of a hierarchy of information storage in a genome, play a large role in the evolution and diversity of multicellular organisms. Gene families are large units of information and genetic variability. Over evolutionary time, gene families have expanded and contracted with genes within a family duplicating and diversifying into new genes, and genes being lost. An entire gene family may also be lost, or gained through de novo gene birth, by such extensive divergence such that a gene is considered part of a new family, or by
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between organisms other than by the ("vertical") transmission of DNA from parent to offspring (reproduction). HGT is an important factor in the e ...
. When the number of genes per genome remains relatively constant, this implies that genes are gained and lost at relatively same rates. There are some patterns in which genes are more likely to be lost vs. which are more likely to duplicate and diversify into multiple copies. An adaptive expansion of a single gene into many initially identical copies occurs when natural selection would favour additional gene copies. This is the case when an environmental stressor acts on a species. Gene amplification is more common in bacteria and is a reversible process. Contraction of gene families commonly results from accumulation of loss of function mutations. A nonsense mutation which prematurely halts gene transcription becomes fixed in the population, leading to the loss of genes. This process occurs when changes in the environment render a gene redundant.


Functional family

In addition to classification by evolution (structural gene family), the HGNC also makes "gene families" by function in their stem nomenclature. As a result, a stem can also refer to genes that have the same function, often part of the same
protein complex A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple active site, catalytic domains are found in a single polypeptide chain. ...
. For example, '' BRCA1'' and ''
BRCA2 ''BRCA2'' and BRCA2 () are human genes and their protein products, respectively. The official symbol (BRCA2, italic for the gene, nonitalic for the protein) and the official name (originally breast cancer 2; currently BRCA2, DNA repair associate ...
'' are unrelated genes that are both named for their role in breast cancer and '' RPS2'' and '' RPS3'' are unrelated ribosomal proteins found in the same small subunit. The HGNC also maintains a "gene group" (formerly "gene family") classification. A gene can be a member of multiple groups, and all groups form a hierarchy. As with the stem classification, both structural and functional groups exist.


See also

* List of gene families *
Protein family A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...


References

{{DEFAULTSORT:Gene Family Population genetics Phylogenetics