SSHHPS
   HOME

TheInfoList



OR:

SSHHPS is an acronym for short stretches of homologous host pathogen sequences. The acronym was first coined by Legler in a 2019 publication. Legler used BLAST to search for host protein substrates for the nsP2 protease of the Venezuelan equine encephalitis virus (VEEV) and the protease from Zika virus. These viruses are Group 4 (+)ssRNA viruses. Short ~20–25 amino acid sequences from the viral polyprotein containing the scissile bond were used to search the human proteome. Many of the sequence alignments were spurious, while some matched well with the residues surrounding the scissile bond. When all known host proteins shown to be cut by viral proteases were consolidated into a table, it became clear that the targets were not random. Most were related to innate immunity while others appeared to be related to viral pathogenesis and the virus-induced phenotype. Some hits were related to both. The list of experimentally confirmed host targets of Group IV viral proteases included key proteins involved in innate immunity e.g. MAVS,
RIG-I RIG-I (retinoic acid-inducible gene I) is a cytosolic pattern recognition receptor (PRR) that can mediate induction of a type-I interferon (IFN1) response. RIG-I is an essential molecule in the innate immune system for recognizing cells that ...
,
STING Stimulator of interferon genes (STING), also known as transmembrane protein 173 (TMEM173) and MPYS/MITA/ERIS is a regulator protein that in humans is encoded by the STING1 gene. STING plays an important role in innate immunity. STING induces typ ...
,
TRIF TIR domain containing adaptor molecule 1 (TICAM1; formerly known as TIR-domain-containing adapter-inducing interferon-β or TRIF) is an adapter in responding to activation of toll-like receptors (TLRs). It mediates the rather delayed cascade of t ...
, and TRIM14. In 1984, one of the first host proteins shown to be cut by a viral protease was
histone H3 Histone H3 is one of the five main histones involved in the structure of chromatin in eukaryotic cells. Featuring a main globular domain and a long N-terminal end, N-terminal tail, H3 is involved with the structure of the nucleosomes of the 'b ...
by foot-and-mouth disease virus. The histone tails are strategic targets of the viral proteases, the cleavage can shut down host cell transcription and the many effects of interferon. Viral proteases recognize sequence motifs. The subsite tolerances in the protease can vary, leading to the recognition of many sequences. The protease is a complement to many peptides.


Silencing

Silencing can occur at the level of DNA, RNA, and protein. The 3rd mechanism of silencing would involve proteases and proteins. SSHHPS cleavage is a type of target specific co- or post-translational silencing. Silencing can occur at the level of DNA, RNA, and Protein. SSHHPS are short stretches of homologous host pathogen sequences. These sequences can be found at the viral protease cleavage sites, they correspond to specific proteins in the host. The cleavage of these sequences can be co- or post-translational. Original figure can be found in Morazzani, et al.


Predictions


SARS-CoV-2

Using PHI-BLAST and a sequence pattern (e.g. L KG) a shorter list of host targets could be obtained; however, the searches still produced hundreds of host targets
YouTube Video
. To sort them and rank order them Legler used clustering. Plotting 'percent positives' vs. 'alignment length' from the PHI-BLAST output file, the cleavable proteins were found to cluster and localize to the right of the graph. The hit lists could now be sorted by alignment length and percent positives and a rank-ordered list could be produced. At the top of the list are the most likely substrates and at the bottom the less likely substrates. This and experimental data became the basis for the first ''sequence-to-symptom'' software for viruses. An example of the software output can be foun
here
After sorting the hits, Legler found that the hits at the top of the list had similarities to the virus-induced phenotype. For the COVID-19 SARS-CoV-2 papain-like protease (PLpro), cardiac myosins were the strongest predicted hit (
MYH6 Myosin heavy chain, α isoform (MHC-α) is a protein that in humans is encoded by the ''MYH6'' gene. This isoform is distinct from the ventricular/slow myosin heavy chain isoform, MYH7, referred to as MHC-β. MHC-α isoform is expressed predomin ...
,
MYH7 Myosin-7 is a protein that in humans is encoded by the ''MYH7'' gene. It is the myosin heavy chain beta (MHC-β) isoform (slow twitch) expressed primarily in the heart, but also in skeletal muscles (type I fibers). This isoform is distinct from ...
);
MYOM1 Myomesin-1 is a protein that in humans is encoded by the ''MYOM1'' gene. Myomesin-1 is expressed in muscle cells and functions to stabilize the three-dimensional conformation of the thick filament. Embryonic forms of Myomesin-1 have been detecte ...
,
POT1 Protection of telomeres protein 1 is a protein that in humans is encoded by the ''POT1'' gene. Function This gene is a member of the telombin family and encodes a nuclear protein involved in telomere maintenance. Specifically, this protein fu ...
, VWF, PROS1, HER4, and
FOXP3 FOXP3 (forkhead box P3), also known as scurfin, is a protein involved in immune system responses. A member of the FOX protein family, FOXP3 appears to function as a master regulator of the regulatory pathway in the development and function of r ...
were also predicted and the sequences were shown to be cleavable. A group at UCSF, showed the cleavage of myofibrils in cardiomyocytes after infection with SARS-CoV-2. Fragments of the sarcomere are still visible showing that the cleavage of the myofibrils occurs post-translationally and after the assembly of the myofibril. The viral proteases have also been suspected i
COVID coagulopathy
The PLpro of SARS-CoV-2 was able to cut sequences in PROS1 and VWF.


Zika virus

Zika Zika fever, also known as Zika virus disease or simply Zika, is an infectious disease caused by the Zika virus. Most cases have no symptoms, but when present they are usually mild and can resemble dengue fever. Symptoms may include fever, conju ...
virus has been associated with
microcephaly Microcephaly (from Neo-Latin ''microcephalia'', from Ancient Greek μικρός ''mikrós'' "small" and κεφαλή ''kephalé'' "head") is a medical condition involving a smaller-than-normal head. Microcephaly may be present at birth or it m ...
and
anencephaly Anencephaly is the absence of a major portion of the brain, skull, and scalp that occurs during embryonic development. It is a cephalic disorder that results from a neural tube defect that occurs when the rostral (head) end of the neural tube ...
. Using the sorting and graphical method described above, hits related to these phenotypes emerged, such as
GIT1 ARF GTPase-activating protein GIT1 is an enzyme that in humans is encoded by the ''GIT1'' gene. GIT1 contains an ARFGAP domain, Anykrin repeats, and a GRK-interacting domain. The Arf-GAP domain, which enables it to act as a GTPase activating pr ...
,
FOXG1 Forkhead box protein G1 is a protein that in humans is encoded by the ''FOXG1'' gene. Function This gene belongs to the forkhead family of transcription factors that is characterized by a distinct forkhead domain. The complete function of thi ...
, and
SFRP1 Secreted frizzled-related protein 1, also known as SFRP1, is a protein which in humans is encoded by the ''SFRP1'' gene. Function Secreted frizzled-related protein 1 (SFRP1) is a member of the SFRP family that contains a cysteine-rich domain h ...

GIT1 knockout
mice develop microcephaly. Mice and rats have not been shown to develop microcephaly after infection with Zika virus (ZIKV). However, Goodfellow, et al. showed tha
chickens
can produce microcephaly when infected with ZIKV. Both humans and chickens have the same sequence at the predicted cleavage site in SFRP1. SFRP1 is a predicted host protein substrate for the Zika viral protease. The sequence is identical in humans and chickens, two species which both produc
microcephaly
after infection with Zika virus. SFRP1 is part of the
Wnt signaling pathway In cellular biology, the Wnt signaling pathways are a group of signal transduction pathways which begin with proteins that pass signals into a cell through cell surface receptors. The name Wnt, pronounced "wint", is a portmanteau created from the ...
. The loss of function of more than one protein may be needed to produce the virus-induced phenotype.


HKU5

The SSHHPS for Pipistrellus bat coronavirus HKU5 (Bat-CoV HKU5) have been predicted and can be foun
here
Analysis of the PLpro SSHHPS in HKU5 identified hits related to neurodevelopmental disorders, epilepsy, seizures, respiratory effects, lung inflammation, spinocerebellar ataxia, microphthalmia, ocular abnormalities, IBS, anhidrosis, hydrocephalus, hearing loss, elevated hemoglobin and hematocrit, skeletal dysplasia, microcephaly, nephrotic syndrome, among others. ADGRA2 was among the predictions.


Experimental confirmation

In 1996, Blom, et al. created a neural network to predict the host targets of picornaviral proteases. One of the predicted hits was dystrophin (''
DMD DMD may refer to: Science, health and medicine * Dimethyldioxirane, an organic molecule * Disruptive mood dysregulation disorder * Doctor of Dental Medicine, an academic degree for the profession of Dentistry * Duchenne muscular dystrophy, a neu ...
''). Badorff, et al. confirmed that dystrophin could be cleaved by the enteroviral 2A protease. Lim, et al. went one step further and generated a transgenic mouse ("the uncleavable mouse" experiment). The knock-in mice had a mutation in the predicted 2A protease cleavage site in dystrophin that could not be cut by the viral protease. When the viral protease was expressed in cardiomyocytes the cleavage-resistant dystrophin inhibited the cardiomyopathy induced by the viral protease. This experiment brought the idea full circle, ''i.e.'' that the viral protease is related to the virus-induced phenotype (''i.e.'' cardiomyopathy). Moreover, the experiment indicated that the clinical presentation could be predicted directly from the viral genome sequence. While Blom's predictions were accurate and could be confirmed by others, a common hit was never found across family or genus.


Conservation - a common hit among neuroinvasive viruses

Using Python, the PHI-BLAST searches and UniProt descriptions could be combined and automated. The search could be repeated several times. Running the searches for 9 neuroinvasive viruses, Legler found that if the viruses were clustered by a common virus-induced phenotype (e.g. neuroinvasiveness) a common hit emerged. One protein common to all 9 hit lists was the orphan G-protein coupled receptor ADGRA2 (also known as
GPR124 Probable G-protein coupled receptor 124 is a protein that in humans is encoded by the ''GPR124'' gene. It is a member of the adhesion-GPCR Adhesion G protein-coupled receptors (adhesion GPCRs) are a class of 33 human protein Receptor (bioche ...
). When ADGRA2 is knocked-out in mouse models of ischemia and glioblastoma blood-brain barrier (BBB) disruption is observed. The cleavage sites for the viral proteases of 9 neuroinvasive viruses were all found in this one protein, in some cases the cleavage sites were predicted to be on the outside of the cell, in other cases the cleavage site is predicted to be in the cytoplasm. Interestingly, the software did not predict a specific cleavage site sequence or a particular type of protease (e.g. serine, cysteine, aspartyl) but rather a general pathway and common target. A strategy to enter the brain may have been preserved during viral evolution.


Origin

RNA viruses are known to acquire host sequences. In some cases whole enzymes have been acquired by viral genomes; the papain-like protease is a good example. Host genomes serve as the largest source of foreign genetic material. Using the RNA sequence of a viral protease cleavage site for SARS-CoV-2 and the bat genome, sequence matches can be found. In Group 4 viruses, the protein sequence of the SSHHPS match the virus, host, and reservoir, while the RNA sequences match sequences in the reservoir species suggesting that they were acquired.


Timing of cleavages

Host protein cleavages can occur between 2 and 8 h post-infection. Many of the targets of the viral proteases are involved in generating the host's innate immune responses. The cleavage of these host proteins is thought to be a mechanism of interferon (IFN) antagonism. Not all viral proteases produce stable cleavage products in cells, in fact in many cases a smear or just a transient cleavage product may exist and then disappear as the cell identifies it as 'damaged' making it difficult to trap. Co-translational cleavage can also occur (e.g. the SARS-CoV-2 PLpro is anchored to the ER membrane). Antibodies may or may not recognize the cut products that persist and cleavage products may be degraded quickly since they are 'damaged' proteins. The viral proteases can be cytotoxic. Protease inhibitors can be used to stabilize the cut product from cell lysates. Cells should be treated with protease inhibitors and frozen 1 to 8 h post-infection without trypsin. Early time points should be taken and MOI varied. N-termini can also be acetylated and C-termini amidated to prevent amino and carboxypeptidases from destroying the cut proteins.


Location of symptom information in viral genomes

The sequences associated with the virus-induced phenotypes for other viruses may be hidden in transcription factors, endonuclease cleavage sites, phosphorylation sites, etc. For Group 4 (+)ssRNA viruses, the information can be found in the protease cleavage sites (the SSHHPS). For Group 6 (+)ssRNA retroviruses the information may be in the protease cleavage sites and elsewhere.


Conservation

SSHHP sequences must show evidence of sequence homology between host and pathogen and a host-pathogen interaction. The sequence in the viral genome may not be identical to a host DNA, but a short stretch of the protein sequence may match at the predicted protease cleavage site. If a protein is found in another species and shares a common evolutionary origin with the protein in the first species, then it is considered a "homologue" of that protein; essentially meaning they are both derived from the same ancestral gene in a common ancestor. SSHHPS appear to be acquired rather than products of accumulated random mutations. David Baltimore proposed a Copy Choice mechanism for RNA recombination in
RNA virus An RNA virus is a virus characterized by a ribonucleic acid (RNA) based genome. The genome can be single-stranded RNA (ssRNA) or double-stranded (Double-stranded RNA, dsRNA). Notable human diseases caused by RNA viruses include influenza, SARS, ...
es where the viral RNA-dependent RNA polymerase switches templates during negative strand synthesis. Host genomes serve as the largest source of foreign genetic material for viruses. RNA has secondary structure and pauses in replication may occur. As to whether certain RNA-binding proteins or enzymes in the reservoir species (e.g. bats) affect or promote RNA recombination is still unclear.


References

{{Reflist RNA viruses