The K Homology (KH) domain is a
protein domain that was first identified in the human
heterogeneous nuclear ribonucleoprotein (hnRNP) K. An evolutionarily conserved sequence of around 70 amino acids, the KH domain is present in a wide variety of nucleic acid-binding proteins. The KH domain binds
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and can function in RNA recognition.
It is found in multiple copies in several proteins, where they can function cooperatively or independently. For example, in the AU-rich element RNA-binding protein KSRP, which has 4 KH domains, KH domains 3 and 4 behave as independent binding modules to interact with different regions of the AU-rich RNA targets.
The solution structure of the first KH domain of FMR1 and of the C-terminal KH domain of hnRNP K determined by nuclear magnetic resonance (NMR) revealed a beta-alpha-alpha-beta-beta-alpha structure.
Autoantibodies to
NOVA1, a KH domain protein, cause
paraneoplastic opsoclonus ataxia. The KH domain is found at the
N-terminus
The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the ami ...
of the ribosomal protein S3. This domain is unusual in that it has a different fold compared to the normal KH domain.
Nucleic acid binding
KH domains bind to either
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
or
single stranded DNA. The nucleic acid is bound in an extended conformation across one side of the domain. The binding occurs in a cleft formed between alpha helix 1, alpha helix 2 the GXXG loop (contains a highly conserved
sequence motif
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as ''As ...
) and the variable loop.
The binding cleft is hydrophobic in nature with a variety of additional protein specific interactions to stabilise the complex. Valverde and colleagues note that, "Nucleic acid base-to-protein aromatic side chain stacking interactions which are prevalent in other types of single stranded nucleic acid binding motifs, are notably absent in KH domain nucleic acid recognition".
Structural groups
Structurally there are two different types of KH domains identified by Grishin which are called type I and type II.
The type I domains are mainly found in eukaryotic proteins, while the type II domains are predominantly found in prokaryotes. While both types share a minimal consensus sequence motif they have different structural folds. The type I KH domains have a three stranded beta-sheet where all three strands are anti-parallel. In the type II domain two of the three beta strands are in a parallel orientation. While type I domains are usually found in multiple copies within proteins, the type II are typically found in a single copy per protein.
Human proteins containing this domain
AKAP1;
ANKHD1;
ANKRD17
Ankyrin repeat domain-containing protein 17 is a protein that in humans is encoded by the ''ANKRD17'' gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meani ...
;
ASCC1;
BICC1;
DDX43
Probable ATP-dependent RNA helicase DDX43 is an enzyme that in humans is encoded by the ''DDX43'' gene.
Function
The protein encoded by this gene is an ATP-dependent RNA helicase in the DEAD box family and displays tumor-specific expressio ...
;
DDX53
DEAD-box helicase 53 is a protein that in humans is encoded by the DDX53 gene.
Function
This intronless gene encodes a protein which contains several domains found in members of the DEAD-box helicase protein family. Other members of this pr ...
;
DPPA5;
FMR1;
FUBP1;
FUBP3;
FXR1;
FXR2;
GLD1;
HDLBP;
HNRPK;
IGF2BP1;
IGF2BP2;
IGF2BP3;
KHDRBS1;
KHDRBS2;
KHDRBS3;
KHSRP
Far upstream element-binding protein 2 is a protein that in humans is encoded by the ''KHSRP'' gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''g ...
;
KRR1
KRR1 small subunit processome component homolog is a protein that in humans is encoded by the ''KRR1'' gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." mean ...
;
MEX3A;
MEX3B;
MEX3C;
MEX3D
Mex-3 homolog D (C. elegans), also known as MEX3D, is a protein that in humans is encoded by the ''MEX3D'' gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." ...
;
NOVA1;
NOVA2
NOVA alternative splicing regulator 2 is a protein that in humans is encoded by the NOVA2 gene.
References
Further reading
External links
PDBe-KB
provides an overview of all the structure information available in the PDB f ...
;
PCBP1;
PCBP2;
PCBP3;
PCBP4
Poly(rC)-binding protein 4 is a protein that in humans is encoded by the ''PCBP4'' gene.
This gene encodes a member of the KH domain protein subfamily. Proteins of this subfamily, also referred to as alpha-CPs, bind to RNA with a specificity f ...
;
PNO1
RNA-binding protein PNO1 is a protein that in humans is encoded by the ''PNO1'' gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or '' ...
;
PNPT1;
QKI;
SF1;
TDRKH
Tudor and KH domain-containing protein is a protein that in humans is encoded by the ''TDRKH'' gene.
References
Further reading
*
*
*
*
*
*
*
{{gene-1-stub ...
;
References
{{InterPro content, IPR004088
Protein domains