
A protein isoform, or "protein variant",
is a member of a set of highly similar
proteins
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, re ...
that originate from a single
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
and are the result of genetic differences. While many perform the same or similar biological roles, some isoforms have unique functions. A set of protein isoforms may be formed from
alternative splicings, variable
promoter usage, or other
post-transcriptional modifications of a single gene;
post-translational modification
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...
s are generally not considered. (For that, see
Proteoforms.) Through
RNA splicing
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcription (biology), transcript is transformed into a mature messenger RNA (Messenger RNA, mRNA). It works by removing all the introns (non-cod ...
mechanisms,
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
has the ability to select different protein-coding segments (
exons
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence i ...
) of a gene, or even different parts of exons from RNA to form different mRNA sequences. Each unique sequence produces a specific form of a protein.
The discovery of isoforms could explain the discrepancy between the small number of protein coding regions of genes revealed by the
human genome project
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
and the large diversity of proteins seen in an organism: different proteins encoded by the same gene could increase the diversity of the
proteome
A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
. Isoforms at the RNA level are readily characterized by
cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
transcript studies. Many human genes possess confirmed
alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
isoforms. It has been estimated that ~100,000 expressed sequence tags (
ESTs) can be identified in humans.
Isoforms at the protein level can manifest in the deletion of whole domains or shorter loops, usually located on the surface of the protein.
Definition
One single gene has the ability to produce multiple proteins that differ both in structure and composition;
this process is regulated by the
alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
of mRNA, though it is not clear to what extent such a process affects the diversity of the human proteome, as the abundance of mRNA transcript isoforms does not necessarily correlate with the abundance of protein isoforms. Three-dimensional protein structure comparisons can be used to help determine which, if any, isoforms represent functional protein products, and the structure of most isoforms in the human proteome has been predicted by
AlphaFold and publicly released a
isoform.io The specificity of translated isoforms is derived by the protein's structure/function, as well as the cell type and developmental stage during which they are produced.
Determining specificity becomes more complicated when a protein has multiple subunits and each subunit has multiple isoforms.
For example, the
5' AMP-activated protein kinase (AMPK), an enzyme, which performs different roles in human cells, has 3 subunits:
* α, catalytic domain, has two isoforms: α1 and α2 which are encoded from
PRKAA1 and
PRKAA2
* β, regulatory domain, has two isoforms: β1 and β2 which are encoded from
PRKAB1 and
PRKAB2
* γ, regulatory domain, has three isoforms: γ1, γ2, and γ3 which are encoded from
PRKAG1,
PRKAG2, and
PRKAG3
In human skeletal muscle, the preferred form is α2β2γ1.
But in the human liver, the most abundant form is α1β2γ1.
Mechanism

The primary mechanisms that produce protein isoforms are alternative splicing and variable promoter usage, though modifications due to genetic changes, such as
mutations
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosi ...
and
polymorphisms are sometimes also considered distinct isoforms.
Alternative splicing is the main
post-transcriptional modification
Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, f ...
process that produces mRNA transcript isoforms, and is a major molecular mechanism that may contribute to protein diversity.
The
spliceosome
A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs ( snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to sp ...
, a large
ribonucleoprotein
Nucleoproteins are proteins conjugated with nucleic acids (either DNA or RNA). Typical nucleoproteins include ribosomes, nucleosomes and viral nucleocapsid proteins.
Structures
Nucleoproteins tend to be positively charged, facilitating inter ...
, is the molecular machine inside the nucleus responsible for RNA cleavage and
ligation, removing non-protein coding segments (
introns
An intron is any Nucleic acid sequence, nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of ...
).
Because splicing is a process that occurs between
transcription and
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
, its primary effects have mainly been studied through
genomics
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
techniques—for example,
microarray
A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
analyses and
RNA sequencing
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also kn ...
have been used to identify alternatively spliced transcripts and measure their abundances.
Transcript abundance is often used as a proxy for the abundance of protein isoforms, though
proteomics
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
experiments using gel electrophoresis and mass spectrometry have demonstrated that the correlation between transcript and protein counts is often low, and that one protein isoform is usually dominant.
One 2015 study states that the cause of this discrepancy likely occurs after translation, though the mechanism is essentially unknown. Consequently, although alternative splicing has been implicated as an important link between variation and disease, there is no conclusive evidence that it acts primarily by producing novel protein isoforms.
Alternative splicing generally describes a tightly regulated process in which alternative transcripts are intentionally generated by the splicing machinery. However, such transcripts are also produced by splicing errors in a process called "noisy splicing," and are also potentially translated into protein isoforms. Although ~95% of multi-exonic genes are thought to be alternatively spliced, one study on noisy splicing observed that most of the different low-abundance transcripts are noise, and predicts that most alternative transcript and protein isoforms present in a cell are not functionally relevant.
Other transcriptional and post-transcriptional regulatory steps can also produce different protein isoforms. Variable promoter usage occurs when the transcriptional machinery of a cell (
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.
Using the e ...
,
transcription factors
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fun ...
, and other
enzymes
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as pro ...
) begin transcription at different promoters—the region of DNA near a gene that serves as an initial binding site—resulting in slightly modified transcripts and protein isoforms.
Characteristics
Generally, one protein isoform is labeled as the canonical sequence based on criteria such as its prevalence and similarity to
orthologous—or functionally analogous—sequences in other species. Isoforms are assumed to have similar functional properties, as most have similar sequences, and share some to most exons with the canonical sequence. However, some isoforms show much greater divergence (for example, through
trans-splicing
''Trans''-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archa ...
), and can share few to no exons with the canonical sequence. In addition, they can have different biological effects—for example, in an extreme case, the function of one isoform can promote cell survival, while another promotes cell death—or can have similar basic functions but differ in their sub-cellular localization. A 2016 study, however, functionally characterized all the isoforms of 1,492 genes and determined that most isoforms behave as "functional alloforms." The authors came to the conclusion that isoforms behave like distinct proteins after observing that the functional of most isoforms did not overlap. Because the study was conducted on cells ''in vitro'', it is not known if the isoforms in the expressed human proteome share these characteristics. Additionally, because the function of each isoform must generally be determined separately, most identified and predicted isoforms still have unknown functions.
Related concepts
Glycoform
A glycoform is an isoform of a protein that differs only with respect to the number or type of attached
glycan
The terms glycans and polysaccharides are defined by IUPAC as synonyms meaning "compounds consisting of a large number of monosaccharides linked glycosidically". However, in practice the term glycan may also be used to refer to the carbohydrate ...
.
Glycoproteins
Glycoproteins are proteins which contain oligosaccharide (sugar) chains covalently attached to amino acid side-chains. The carbohydrate is attached to the protein in a cotranslational or posttranslational modification. This process is known a ...
often consist of a number of different glycoforms, with alterations in the attached
saccharide
A carbohydrate () is a biomolecule composed of carbon (C), hydrogen (H), and oxygen (O) atoms. The typical hydrogen-to-oxygen atomic ratio is 2:1, analogous to that of water, and is represented by the empirical formula (where ''m'' and ''n'' m ...
or
oligosaccharide
An oligosaccharide (; ) is a carbohydrate, saccharide polymer containing a small number (typically three to ten) of monosaccharides (simple sugars). Oligosaccharides can have many functions including Cell–cell recognition, cell recognition and ce ...
. These modifications may result from differences in
biosynthesis
Biosynthesis, i.e., chemical synthesis occurring in biological contexts, is a term most often referring to multi-step, enzyme-Catalysis, catalyzed processes where chemical substances absorbed as nutrients (or previously converted through biosynthe ...
during the process of
glycosylation
Glycosylation is the reaction in which a carbohydrate (or ' glycan'), i.e. a glycosyl donor, is attached to a hydroxyl or other functional group of another molecule (a glycosyl acceptor) in order to form a glycoconjugate. In biology (but not ...
, or due to the action of
glycosidases or
glycosyltransferases. Glycoforms may be detected through detailed chemical analysis of separated glycoforms, but more conveniently detected through differential reaction with
lectins
Lectins are carbohydrate-binding proteins that are highly specific for sugar groups that are part of other molecules, so cause agglutination of particular cells or precipitation of glycoconjugates and polysaccharides. Lectins have a role in r ...
, as in
lectin affinity chromatography and
lectin
Lectins are carbohydrate-binding proteins that are highly specific for sugar Moiety (chemistry), groups that are part of other molecules, so cause agglutination (biology), agglutination of particular cells or precipitation of glycoconjugates an ...
affinity electrophoresis. Typical examples of glycoproteins consisting of glycoforms are the
blood proteins
Plasma proteins, sometimes referred to as blood proteins, are proteins present in blood plasma. They perform many different functions, including transport of hormones, vitamins and minerals in activity and functioning of the immune system. Other ...
as
orosomucoid
Introduction
Orosomucoid (ORM) or alpha-1-acid glycoprotein (''α1AGp'', ''AGP'' or ''AAG'') is an acute phase protein found in blood plasma, plasma. Orosomucoid was discovered over 70 years ago and belongs to the lipocalin protein family. The ...
,
antitrypsin, and
haptoglobin. An unusual glycoform variation is seen in
neuronal cell adhesion molecule, NCAM involving
polysialic acids, PSA.
Examples
*
G-actin: despite its conserved nature, it has a varying number of isoforms (at least six in mammals).
*
Creatine kinase
Creatine kinase (CK), also known as creatine phosphokinase (CPK) or phosphocreatine kinase, is an enzyme () expressed by various tissues and cell types. CK catalyses the conversion of creatine and uses adenosine triphosphate (ATP) to create phos ...
, the presence of which in the blood can be used as an aid in the diagnosis of
myocardial infarction
A myocardial infarction (MI), commonly known as a heart attack, occurs when Ischemia, blood flow decreases or stops in one of the coronary arteries of the heart, causing infarction (tissue death) to the heart muscle. The most common symptom ...
, exists in 3 isoforms.
*
Hyaluronan synthase, the enzyme responsible for the production of hyaluronan, has three isoforms in mammalian cells.
*
UDP-glucuronosyltransferase
Uridine 5'-diphospho-glucuronosyltransferase ( UDP-glucuronosyltransferase, UDPGT or UGT) is a microsomal glycosyltransferase () that catalyzes the transfer of the glucuronic acid component of UDP-glucuronic acid to a small hydrophobic molecu ...
, an enzyme superfamily responsible for the detoxification pathway of many drugs, environmental pollutants, and toxic endogenous compounds has 16 known isoforms encoded in the human genome.
*G6PDA: normal ratio of active isoforms in cells of any tissue is 1:1 shared with G6PDG. This is precisely the normal isoform ratio in hyperplasia. Only one of these isoforms is found during neoplasia.
[Pathoma, Fundamentals of Pathology]
Monoamine oxidase
Monoamine oxidases (MAO) () are a family of enzymes that catalyze the oxidation of monoamines, employing oxygen to clip off their amine group. They are found bound to the outer membrane of mitochondria in most cell types of the body. The fi ...
, a family of enzymes that catalyze the oxidation of monoamines, exists in two isoforms, MAO-A and MAO-B.
See also
*
Gene isoform
References
External links
MeSH entry protein isoformsDefinitions Isoform
{{DEFAULTSORT:Protein Isoform
Protein structure