Coronavirus Nucleocapsid Protein
   HOME

TheInfoList



OR:

The nucleocapsid (N) protein is a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
that packages the positive-sense RNA genome of
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
es to form ribonucleoprotein structures enclosed within the viral capsid. The N protein is the most highly expressed of the four major coronavirus structural proteins. In addition to its interactions with
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, N forms protein-protein interactions with the coronavirus membrane protein (M) during the process of viral assembly. N also has additional functions in manipulating the
cell cycle The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell (biology), cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA re ...
of the host cell. The N protein is highly immunogenic and antibodies to N are found in patients recovered from SARS and
COVID-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the coronavirus SARS-CoV-2. In January 2020, the disease spread worldwide, resulting in the COVID-19 pandemic. The symptoms of COVID‑19 can vary but often include fever ...
.


History

COVID-19 was first identified in January 2020. A patient in the state of Washington was given a diagnosis of coronavirus infection on 20 January. A group of scientists based at the
Centers for Disease Control and Prevention The Centers for Disease Control and Prevention (CDC) is the National public health institutes, national public health agency of the United States. It is a Federal agencies of the United States, United States federal agency under the United S ...
in
Atlanta, Georgia Atlanta ( ) is the List of capitals in the United States, capital and List of municipalities in Georgia (U.S. state), most populous city in the U.S. state of Georgia (U.S. state), Georgia. It is the county seat, seat of Fulton County, Georg ...
isolated the virus from nasopharyngeal and oropharyngeal swabs and were able to characterize the genomic sequence, replication properties and cell culture tropism from the swabs. They made available the virus to the wider scientific community shortly thereafter "by depositing it into two virus reagent repositories".


Structure

The N protein is composed of two main
protein domain In molecular biology, a protein domain is a region of a protein's Peptide, polypeptide chain that is self-stabilizing and that Protein folding, folds independently from the rest. Each domain forms a compact folded Protein tertiary structure, thre ...
s connected by an intrinsically disordered region (IDR) known as the linker region, with additional disordered segments at each terminus. A third small domain at the C-terminal tail appears to have an ordered alpha helical secondary structure and may be involved in the formation of higher-order oligomeric assemblies. In SARS-CoV, the causative agent of SARS, the N protein is 422 amino acid residues long and in
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
, the causative agent of
COVID-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the coronavirus SARS-CoV-2. In January 2020, the disease spread worldwide, resulting in the COVID-19 pandemic. The symptoms of COVID‑19 can vary but often include fever ...
, it is 419 residues long. Both the N-terminal and C-terminal domains are capable of binding
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
. The C-terminal domain forms a dimer that is likely to be the native functional state. Parts of the IDR, particularly a conserved sequence motif rich in serine and
arginine Arginine is the amino acid with the formula (H2N)(HN)CN(H)(CH2)3CH(NH2)CO2H. The molecule features a guanidinium, guanidino group appended to a standard amino acid framework. At physiological pH, the carboxylic acid is deprotonated (−CO2−) a ...
residues (the SR-rich region), may also be implicated in dimer formation, though reports on this vary. Although higher-order oligomers formed through the C-terminal domain have been observed crystallographically, it is unclear if these structures have a physiological role. The C-terminal dimer has been structurally characterized by
X-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
for several coronaviruses and has a highly conserved structure. The N-terminal domain - sometimes known as the RNA-binding domain, though other parts of the protein also interact with RNA - has also been crystallized and has been studied by nuclear magnetic resonance spectroscopy in the presence of RNA.


Post-translational modifications

The N protein is post-translationally modified by
phosphorylation In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols: : This equation can be writ ...
at sites located in the IDR, particularly in the SR-rich region. SARS-CoV-2 nucleocapsid (N) protein is arginine methylated by protein arginine methyltransferase 1 (PRMT1) at residues R95 and R177. Type I PRMT inhibitor (MS023) or substitution of R95 or R177 with lysine inhibited interaction of N protein with the 5’-UTR of SARS-CoV-2 genomic RNA, a property required for viral packaging , doi: 10.1016/j.jbc.2021.100821 , PMID 34029587. In several coronaviruses, ADP-ribosylation of the N protein has also been reported. With unclear functional significance, the SARS-CoV N protein has been observed to be SUMOylated and the N proteins of several coronaviruses including SARS-CoV-2 have been observed to be proteolytically cleaved.


Expression and localization

The N protein is the most highly expressed in host cells of the four major structural proteins. Like the other structural proteins, the
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
encoding the N protein is located toward the 3' end of the genome. N protein is localized primarily to the
cytoplasm The cytoplasm describes all the material within a eukaryotic or prokaryotic cell, enclosed by the cell membrane, including the organelles and excluding the nucleus in eukaryotic cells. The material inside the nucleus of a eukaryotic cell a ...
. In many coronaviruses, a population of N protein is localized to the
nucleolus The nucleolus (; : nucleoli ) is the largest structure in the cell nucleus, nucleus of eukaryote, eukaryotic cell (biology), cells. It is best known as the site of ribosome biogenesis. The nucleolus also participates in the formation of signa ...
, thought to be associated with its effects on the
cell cycle The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell (biology), cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA re ...
.


Function


Genome packaging and viral assembly

The N protein binds to
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
to form ribonucleoprotein (RNP) structures for packaging the genome into the viral capsid. The RNP particles formed are roughly spherical and are organized in flexible helical structures inside the virus. Formation of RNPs is thought to involve allosteric interactions between RNA and multiple RNA-binding regions of the protein. Dimerization of N is important for assembly of RNPs. Encapsidation of the genome occurs through interactions between N and M. N is essential for viral assembly. N also serves as a chaperone protein for the formation of RNA structure in the genomic RNA.


Genomic and subgenomic RNA synthesis

Synthesis of genomic RNA appears to involve participation by the N protein. N is physically colocalized with the viral RNA-dependent RNA polymerase early in the replication cycle and forms interactions with non-structural protein 3, a component of the replicase-transcriptase complex. Although N appears to facilitate efficient replication of genomic RNA, it is not required for RNA transcription in all coronaviruses. In at least one coronavirus, transmissible gastroenteritis virus (TGEV), N is involved in template switching in the production of subgenomic mRNAs, a process that is a distinctive feature of viruses in the order '' Nidovirales''.


Cell cycle effects

Coronaviruses manipulate the
cell cycle The cell cycle, or cell-division cycle, is the sequential series of events that take place in a cell (biology), cell that causes it to divide into two daughter cells. These events include the growth of the cell, duplication of its DNA (DNA re ...
of the host cell through various mechanisms. In several coronaviruses, including SARS-CoV, the N protein has been reported to cause cell cycle arrest in S phase through interactions with cyclin-CDK. In SARS-CoV, a cyclin box-binding region in the N protein can serve as a cyclin-CDK
phosphorylation In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols: : This equation can be writ ...
substrate. Trafficking of N to the
nucleolus The nucleolus (; : nucleoli ) is the largest structure in the cell nucleus, nucleus of eukaryote, eukaryotic cell (biology), cells. It is best known as the site of ribosome biogenesis. The nucleolus also participates in the formation of signa ...
may also play a role in cell cycle effects. More broadly, N may be involved in reduction of host cell protein translation activity.


Immune system effects

The N protein is involved in viral pathogenesis via its effects on components of the immune system. In SARS-CoV, MERS-CoV, and
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
, N has been reported as suppressing interferon responses.


Evolution and conservation

The sequences and structures of N proteins from different coronaviruses, particularly the C-terminal domains, appear to be well conserved. Similarities between the structure and topology of the N proteins of coronaviruses and arteriviruses suggest a common evolutionary origin and supports the classification of these two groups in the common order '' Nidovirales''. Examination of SARS-CoV-2 sequences collected during the
COVID-19 pandemic The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
found that missense mutations were most common in the central linker region of the protein, suggesting this relatively unstructured region is more tolerant of mutations than the structured domains. A separate study of SARS-CoV-2 sequences identified at least one site in the N protein under positive selection. The N protein's properties of being well conserved, not appearing to recombine frequently, and producing a strong T-cell response have led to it being studied as a potential target for coronavirus vaccines. The vaccine candidate UB-612 is one such experimental vaccine that targets the N protein, along with other viral proteins, to attempt to induce broad immunity.


References

{{Viral proteins Coronavirus proteins Viral protein class Viral structural proteins