Chromosome 18

open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...

63 is a

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...

which in humans is encoded by the C18orf63

gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...

. This protein is not yet well understood by the scientific community. Research has been conducted suggesting that C18orf63 could be a potential

biomarker In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...

for early stage

pancreatic cancer Pancreatic cancer arises when cell (biology), cells in the pancreas, a glandular organ behind the stomach, begin to multiply out of control and form a Neoplasm, mass. These cancerous cells have the malignant, ability to invade other parts of ...

and

breast cancer Breast cancer is a cancer that develops from breast tissue. Signs of breast cancer may include a Breast lump, lump in the breast, a change in breast shape, dimpling of the skin, Milk-rejection sign, milk rejection, fluid coming from the nipp ...

Gene

This gene is located at band 22, sub-band 3, on the long arm of

chromosome 18 Chromosome 18 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome. Chromosome 18 spans about 80 million base pairs (the building material of DNA) and represents about 2.5 percent of the total DNA in ...

. It is composed of 5065

base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...

s spanning from 74,315,875 to 74,359,187 bp on chromosome 18. The gene has a total of 14

exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...

s. C18orf63 is also known by the alias DKFZP78G0119. No isoforms exist for this gene. GEO Expression Profile for C18orf63

Expression

C18orf63 has high expression in the

testis A testicle or testis ( testes) is the gonad in all male bilaterians, including humans, and is Homology (biology), homologous to the ovary in females. Its primary functions are the production of sperm and the secretion of Androgen, androgens, p ...

. The gene shows low expression in the kidneys, liver, lung, and pelvis. There is no

phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...

associated with this gene.

Promoter

The

promoter region In genetics, a promoter is a sequence of DNA to which proteins bind to initiate transcription of a single RNA transcript from the DNA downstream of the promoter. The RNA transcript may encode a protein (mRNA), or can have a function in and of it ...

for C18orf63 is 1163 bp long starting at 74,314,813 bp and ending at 74,315,975 bp. The promoter ID is GXP_4417391. The presence of multiple y-box binding transcription factors and SRY transcription factor binding sites suggest that C18orf63 is involved in male sex determination.

Protein

Amino acid composition normal vs c18orf63

The C18orf63 protein is composed up of 685

amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...

s and has a molecular weight of 77230.50 Da, with a predicted

isoelectric point The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electric charge, electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). Howe ...

of 9.83. No

isoforms A protein isoform, or "protein variant", is a member of a set of highly similar proteins that originate from a single gene and are the result of genetic differences. While many perform the same or similar biological roles, some isoforms have uniqu ...

exist for this protein. This protein is rich in

glutamine Glutamine (symbol Gln or Q) is an α-amino acid that is used in the biosynthesis of proteins. Its side chain is similar to that of glutamic acid, except the carboxylic acid group is replaced by an amide. It is classified as a charge-neutral ...

isoleucine Isoleucine (symbol Ile or I) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated −NH form under biological conditions), an α-carboxylic acid group (which is in the depro ...

lysine Lysine (symbol Lys or K) is an α-amino acid that is a precursor to many proteins. Lysine contains an α-amino group (which is in the protonated form when the lysine is dissolved in water at physiological pH), an α-carboxylic acid group ( ...

, and

serine Serine (symbol Ser or S) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α- amino group (which is in the protonated − form under biological conditions), a carboxyl group (which is in the deprotonated − ...

when compared to the average protein, but lacks in

aspartic acid Aspartic acid (symbol Asp or D; the ionic form is known as aspartate), is an α-amino acid that is used in the biosynthesis of proteins. The L-isomer of aspartic acid is one of the 22 proteinogenic amino acids, i.e., the building blocks of protei ...

and

glycine Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid. Glycine is one of the proteinogenic amino acids. It is encoded by all the codons starting with GG (G ...

Structure

In the predicted secondary structure for this protein there are a number of

beta turn β turns (also β-bends, tight turns, reverse turns, Venkatachalam turns) are the most common form of turns—a type of non-regular secondary structure in proteins that cause a change in direction of the polypeptide chain. They are very common mot ...

s, beta strands and

alpha helices An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix). The alpha helix is the most common structural arrangement in the secondary structure of proteins. It is also the most extreme type of l ...

. For C18orf63 48.6% of the protein is expected to form alpha helices and 28.6% of the structure is expected to be composed of beta strands.

Domains and Motifs

The protein contains one

domain of unknown function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 201 ...

, DUF 4709, spanning from the 7th amino acid to the 280th amino acid. Motifs that are predicted to exist include an N-terminal motif, RxxL motif, and KEN conserving motif, which all signal for

protein degradation Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Protein degradation is a major regulatory mechanism of gene expression and contributes substantially to shaping mammalian proteomes. Uncatalysed, the hydrolysis o ...

. Another motif that is predicted to exist is a Wxxx motif, which facilitates entrance of PTS1 cargo proteins into the organellar lumen, and a RVxPx motif which allows protein transport from the

trans-Golgi network The Golgi apparatus (), also known as the Golgi complex, Golgi body, or simply the Golgi, is an organelle found in most eukaryotic cells. Part of the endomembrane system in the cytoplasm, it packages proteins into membrane-bound vesicles insi ...

to the

plasma membrane The cell membrane (also known as the plasma membrane or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of a cell from the outside environment (the extr ...

of the

cilia The cilium (: cilia; ; in Medieval Latin and in anatomy, ''cilium'') is a short hair-like membrane protrusion from many types of eukaryotic cell. (Cilia are absent in bacteria and archaea.) The cilium has the shape of a slender threadlike proj ...

. There is also a bipartite

nuclear localization signal A nuclear localization signal ''or'' sequence (NLS) is an amino acid sequence that 'tags' a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysin ...

at the end of the protein sequence. There is no trans-membrane domain present, indicating that C18orf63 is not a trans-membrane protein.

Post-Translational Modifications

Post-translational modification In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...

s the protein is predicted to undergo include

SUMOylation In molecular biology, SUMO (Small Ubiquitin-like Modifier) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function. This process is called SUMOylation (pronounced ...

, PKC and CK2

phosphorylation In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols: : This equation can be writ ...

, ''N''-glycosylation, amiditation, and cleavage. There are six total PKC phosphorylation sites and 2 CK2 phosphorylation sites, 2 SUMOylation sites, and 2 ''N''-glycosylation sites. There are no signal peptides present in this sequence.

Subcellular Location

Due to the nuclear localization signal at the end of the protein sequence, C18orf63 is predicted to be

nuclear Nuclear may refer to: Physics Relating to the nucleus of the atom: *Nuclear engineering *Nuclear physics *Nuclear power *Nuclear reactor *Nuclear weapon *Nuclear medicine *Radiation therapy *Nuclear warfare Mathematics * Nuclear space *Nuclear ...

. C18orf63 has also been predicted to be targeted to the

mitochondria A mitochondrion () is an organelle found in the cells of most eukaryotes, such as animals, plants and fungi. Mitochondria have a double membrane structure and use aerobic respiration to generate adenosine triphosphate (ATP), which is us ...

in addition to the nucleus.

Homology

Orthologs

Ortholog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speci ...

s have been found in most

eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...

s, with the exception of the class ''

Amphibia Amphibians are ectothermic, anamniotic, four-limbed vertebrate animals that constitute the class Amphibia. In its broadest sense, it is a paraphyletic group encompassing all tetrapods, but excluding the amniotes (tetrapods with an amniotic ...

''. No human

paralogs Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speci ...

exist for C18orf63. The most distant homolog detectable is ''

Mizuhopecten yessoensis ''Mizuhopecten yessoensis'' (Yesso scallop, giant Ezo scallop) is a species of Marine (ocean), marine bivalve mollusc, mollusks in the family (biology), family Pectinidae, the scallops. Its name Yesso/Ezo refers to its being found north of Japan. ...

'', sharing a 37% identity with the human protein sequence. The domain of unknown function was the only homologous domain present in the protein sequence, it was found to be highly conserved in all orthologs. The table below shows some examples of various orthologs for this protein. Rate of evolution c18orf63

Rate of Evolution

C18orf63 is a mildly slow evolving protein. The protein evolves faster than Cytochorme C but slower than Betaglobin.

Interacting proteins

Transcription factors of interest predicted to bind to the regulatory sequence include p53 tumor suppressors, SRY testis determining factors, Y-box binding transcription factors, and glucocorticoid responsive elements. The JUN protein was found to interact with C18orf63 through anti-bait co-immunoprecipitation. The JUN protein binds to the USP28 promoter in colorectal cancer cells and is involved in the activation of these cancer cells.

Clinical significance

Mutations

A variety of

missense mutation In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Missense mutations change amino acids, which in turn alt ...

s occur in the human population for this protein. In the regulatory sequence missense mutations occur at two transcription factor binding sites. Transcription factors affected are glucocorticoid responsive elements and E2F-myc cell cycle regulars. There are eleven common mutations that occur that affect the protein sequence itself. None of these mutations affect predicted post-translational modifications the protein sequence undergoes.

Disease association

C18orf63 has been associated with

personality disorder Personality disorders (PD) are a class of mental health conditions characterized by enduring maladaptive patterns of behavior, cognition, and inner experience, exhibited across many contexts and deviating from those accepted by the culture. ...

obesity Obesity is a medical condition, considered by multiple organizations to be a disease, in which excess Adipose tissue, body fat has accumulated to such an extent that it can potentially have negative effects on health. People are classifi ...

, and type two diabetes through a

genome-wide association study In genomics, a genome-wide association study (GWA study, or GWAS), is an observational study of a genome-wide set of Single-nucleotide polymorphism, genetic variants in different individuals to see if any variant is associated with a trait. GWA s ...

. Currently research has not shown if C18orf63 plays a direct role in any of these diseases.

References

{{Reflist, 32em Chromosomes Proteins