ORF1ab
   HOME

TheInfoList



OR:

ORF1ab (also ORF1a/b) refers collectively to two
open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
s (ORFs), ORF1a and ORF1b, that are conserved in the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s of nidoviruses, a group of viruses that includes
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
es. The
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s express large polyproteins that undergo
proteolysis Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Protein degradation is a major regulatory mechanism of gene expression and contributes substantially to shaping mammalian proteomes. Uncatalysed, the hydrolysis o ...
to form several nonstructural proteins with various functions in the
viral life cycle Viruses are only able to Replicate (biology), replicate themselves by commandeering the reproductive apparatus of cells and making them reproduce the virus's genetic structure and virion, particles instead. How viruses do this depends mainly on t ...
, including
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalysis, catalyzes proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the formation of new protein products ...
s and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a programmed ribosomal frameshift that allows the
ribosome Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
to continue translating past the
stop codon In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
at the end of ORF1a, in a -1
reading frame In molecular biology, a reading frame is a specific choice out of the possible ways to read the nucleic acid sequence, sequence of nucleotides in a nucleic acid (DNA or RNA) molecule as a sequence of triplets. Where these triplets equate to amino ...
. The resulting polyproteins are known as pp1a and pp1ab.


Expression

ORF1a is the first
open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
at the
5' end Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-r ...
of the genome. Together ORF1ab occupies about two thirds of the genome, with the remaining third at the
3' end Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-ri ...
encoding the
structural proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
and accessory proteins. It is translated from a 5' capped RNA by cap-dependent translation. Nidoviruses have a complex system of discontinuous subgenomic RNA production to enable expression of genes in their relatively large RNA genomes (typically 27-32 kb for coronaviruses), but ORF1ab is translated directly from the genomic RNA. ORF1ab sequences have been observed in noncanonical subgenomic RNAs, though their functional significance is unclear. A programmed ribosomal frameshift allows reading through the
stop codon In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
that terminates ORF1a to continue in a -1
reading frame In molecular biology, a reading frame is a specific choice out of the possible ways to read the nucleic acid sequence, sequence of nucleotides in a nucleic acid (DNA or RNA) molecule as a sequence of triplets. Where these triplets equate to amino ...
, producing the longer polyprotein pp1ab. The frameshift occurs at a slippery sequence which is followed by a pseudoknot RNA secondary structure. This has been measured at between 20-50% efficiency for
murine coronavirus Murine coronavirus (M-CoV) is a virus in the genus ''Betacoronavirus'' that infects mice. Belonging to the subgenus ''Embecovirus'', murine coronavirus strains are :wikt:enterotropic, enterotropic or polytropic. Enterotropic strains include mouse ...
, or 45-70% in
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
yielding a
stoichiometry Stoichiometry () is the relationships between the masses of reactants and Product (chemistry), products before, during, and following chemical reactions. Stoichiometry is based on the law of conservation of mass; the total mass of reactants must ...
of roughly 1.5 to 2 times as much pp1a as pp1ab protein expressed.


Processing

The polyproteins pp1a and pp1ab contain about 13 to 17 nonstructural proteins. They undergo auto-
proteolysis Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Protein degradation is a major regulatory mechanism of gene expression and contributes substantially to shaping mammalian proteomes. Uncatalysed, the hydrolysis o ...
to release the nonstructural proteins due to the actions of internal
cysteine protease Cysteine proteases, also known as thiol proteases, are hydrolase enzymes that degrade proteins. These proteases share a common catalytic mechanism that involves a nucleophilic cysteine thiol in a catalytic triad or dyad. Discovered by Gopal Chu ...
domains. In coronaviruses, there are a total of 16 nonstructural proteins; pp1a protein contains nonstructural proteins nsp1-11 and the pp1ab protein contains nsp1-10 and nsp12-16. Proteolytic processing is performed by two proteases: the papain-like protease
protein domain In molecular biology, a protein domain is a region of a protein's Peptide, polypeptide chain that is self-stabilizing and that Protein folding, folds independently from the rest. Each domain forms a compact folded Protein tertiary structure, thre ...
located in the multidomain protein nsp3 cleaves up to nsp4, and the 3CL protease (also known as the main protease, nsp5) performs the remaining cleavages of nsp5 through the polyprotein
C-terminus The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, carboxy tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein Proteins are large biomolecules and macromolecules that comp ...
. Proteins nsp12-16, the C-terminal components of the pp1ab polyprotein, contain the core enzymatic activities necessary for
viral replication Viral replication is the formation of biological viruses during the infection process in the target host cells. Viruses must first get into the cell before viral replication can occur. Through the generation of abundant copies of its genome ...
. After proteolytic processing, several of the nonstructural proteins assemble into a large
protein complex A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple active site, catalytic domains are found in a single polypeptide chain. ...
known as the replicase-transcriptase complex (RTC) which performs genome replication and transcription.


Components


Core replicase domains

A set of five conserved "core replicase"
protein domain In molecular biology, a protein domain is a region of a protein's Peptide, polypeptide chain that is self-stabilizing and that Protein folding, folds independently from the rest. Each domain forms a compact folded Protein tertiary structure, thre ...
s are present in all nidovirus lineages ( arteriviruses, mesoniviruses, roniviruses, and
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
es): from ORF1a, the main protease flanked on either end by
transmembrane domain A transmembrane domain (TMD, TM domain) is a membrane-spanning protein domain. TMDs may consist of one or several alpha-helices or a transmembrane beta barrel. Because the interior of the lipid bilayer is hydrophobic, the amino acid residues in ...
s; and from ORF1b, a nucleotidyltransferase domain known as NiRAN,
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the self-replication, replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand Complementarity (molecular biology), compleme ...
(RdRp), a
zinc Zinc is a chemical element; it has symbol Zn and atomic number 30. It is a slightly brittle metal at room temperature and has a shiny-greyish appearance when oxidation is removed. It is the first element in group 12 (IIB) of the periodic tabl ...
-binding domain, and a
helicase Helicases are a class of enzymes that are vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic double helix, separating the two hybridized ...
. (This is sometimes considered seven domains, counting the transmembrane regions separately.) In addition, an
endoribonuclease In biochemistry, an endoribonuclease is a class of enzyme which is a type of ribonuclease (an RNA cleaver), itself a type of endonuclease (a nucleotide cleaver). It cleaves either single-stranded or double-stranded RNA, depending on the enzyme. Ex ...
domain is found in all nidoviruses that infect
vertebrate Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain. The vertebrates make up the subphylum Vertebra ...
hosts. Arteriviruses, which have smaller genomes than the other nidovirus lineages, also lack methyltransferases as well as a proofreading
exoribonuclease An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3 ...
, a domain that is conserved in nidoviruses with larger genomes. This proofreading functionality is thought to be required for sufficient fidelity to replicate large RNA genomes, but may also play additional roles in some viruses.


Coronaviruses

In coronaviruses, pp1a and pp1ab together contain sixteen nonstructural proteins, which have the following functions:


Evolution

The structure and organization of the genome, including ORF1a, ORF1b, and the frameshift separating them, is conserved among nidoviruses. Some "non-canonical" nidovirus structures have been described, mainly involving
gene fusion In genetics, a fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types ...
s. The largest known nidovirus, planarian secretory cell nidovirus (PSCNV), with a 41kb genome, has a non-canonical genome structure in which ORF1a, ORF1b, and downstream ORFs containing structural proteins are fused and expressed as a single large ORF encoding a polyprotein of over 13,000
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
s. In these non-canonical genomes, other frameshift locations or
stop codon In molecular biology, a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the additio ...
readthrough may be used to regulate the
stoichiometry Stoichiometry () is the relationships between the masses of reactants and Product (chemistry), products before, during, and following chemical reactions. Stoichiometry is based on the law of conservation of mass; the total mass of reactants must ...
of viral proteins. Nidoviruses vary widely in genome size, from arteriviruses with typically 12-15kb genomes to
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
es at 27-32kb. Their evolutionary history has been of research interest in understanding the replication of very large RNA genomes despite the relatively low-fidelity replication mechanism of the viral
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the self-replication, replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand Complementarity (molecular biology), compleme ...
(RdRp). The larger nidovirus genomes (above around 20kb) encode a proofreading
exoribonuclease An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3 ...
( nsp14 in coronaviruses) thought to be required for replication fidelity. Among
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
es, ORF1ab is more highly conserved than the 3' ORFs encoding structural proteins. Throughout the
COVID-19 pandemic The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
, the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
of
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
viruses has been
sequenced In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succi ...
many times, resulting in identification of thousands of distinct
variants Variant may refer to: Arts and entertainment * ''Variant'' (magazine), a former British cultural magazine * Variant cover, an issue of comic books with varying cover art * ''Variant'' (novel), a novel by Robison Wells * " The Variant", 2021 epis ...
. In a
World Health Organization The World Health Organization (WHO) is a list of specialized agencies of the United Nations, specialized agency of the United Nations which coordinates responses to international public health issues and emergencies. It is headquartered in Gen ...
analysis from July 2020, ORF1ab was the most frequently
mutated In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA replication, DNA or viral rep ...
gene, followed by the S gene encoding the
spike protein In virology, a spike protein or peplomer protein is a protein that forms a large structure known as a spike or peplomer projecting from the surface of an viral envelope, enveloped virus. as cited in The proteins are usually glycoproteins that ...
. The most commonly mutated protein within ORF1ab was papain-like protease (nsp3), and the single most commonly observed
missense mutation In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Missense mutations change amino acids, which in turn alt ...
was in
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the self-replication, replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand Complementarity (molecular biology), compleme ...
. Some PCR tests that detect COVID-19 analyze the specimen for the ORF1ab gene, among others.


References

{{Viral proteins Coronavirus proteins