Chromosome 1 open reading frame 112, is a

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...

that in humans is encoded by the C1orf112

gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...

, and is located at position 1q24.2. C1orf112 encodes for seventeen variants of

mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...

, fifteen of which are functional proteins. C1orf112 has a determined precursor molecular weight of 96.6 kDa and an

isoelectric point The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electric charge, electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). Howe ...

of 5.62. C1orf112 has been experimentally determined to localize to the

mitochondria A mitochondrion () is an organelle found in the cells of most eukaryotes, such as animals, plants and fungi. Mitochondria have a double membrane structure and use aerobic respiration to generate adenosine triphosphate (ATP), which is us ...

, although it does not contain a

mitochondrial targeting sequence A target peptide is a short (3-70 amino acids long) peptide chain that directs the transport of a protein to a specific region in the cell, including the nucleus, mitochondria, endoplasmic reticulum (ER), chloroplast, apoplast, peroxisome and pla ...

Gene

The gene spans 192,073 base pairs, with 29 different exons. C1orf112 is located at position 1q24.2. C1orf112 shares

antisense In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the context, ...

coding regions with C1orf156 and

SCYL3 Protein-associating with the carboxyl-terminal domain of ezrin is a protein that in humans is encoded by the ''SCYL3'' gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a ...

Protein

There are currently eight experimentally determined RefSeq

isoforms A protein isoform, or "protein variant", is a member of a set of highly similar proteins that originate from a single gene and are the result of genetic differences. While many perform the same or similar biological roles, some isoforms have uniqu ...

. C1orf112 has a domain of unknown function DUF4487.

Composition

Compositional analysis through SAPS predicted much less

glycine Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid. Glycine is one of the proteinogenic amino acids. It is encoded by all the codons starting with GG (G ...

and much more

leucine Leucine (symbol Leu or L) is an essential amino acid that is used in the biosynthesis of proteins. Leucine is an α-amino acid, meaning it contains an α-amino group (which is in the protonated −NH3+ form under biological conditions), an α-Car ...

than expected relative to other human protein sequences. This characteristic is conserved across primate orthologs. A mixed charge cluster was found in Isoform X1 from position 747 to 805, indicating that this segment may be aqueous and tightly bound. This mixed charge cluster is only partially conserved across

orthologs Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...

Transcripts

C1orf112 is determined to have 9 transcripts, or splice variants by Ensembl.

Subcellular Localization

Antibody

immunocytochemistry Immunocytochemistry (ICC) is a common laboratory technique that is used to anatomically visualize the localization of a specific protein or antigen in cells by use of a specific primary antibody that binds to it. The primary antibody allows vis ...

and

immunofluorescent Immunofluorescence (IF) is a light microscopy-based technique that allows detection and localization of a wide variety of target biomolecules within a cell or tissue at a quantitative level. The technique utilizes the binding specificity of antib ...

staining of human cell line A-431 indicates C1orf112 is localized to the

Regulation

Gene level regulation

Expression

Although tissue-level expression is ubiquitous, C1orf112 is expressed highest in the

testes A testicle or testis ( testes) is the gonad in all male bilaterians, including humans, and is homologous to the ovary in females. Its primary functions are the production of sperm and the secretion of androgens, primarily testosterone. The ...

lymph node A lymph node, or lymph gland, is a kidney-shaped organ of the lymphatic system and the adaptive immune system. A large number of lymph nodes are linked throughout the body by the lymphatic vessels. They are major sites of lymphocytes that includ ...

brain marrow The brain is an organ (biology), organ that serves as the center of the nervous system in all vertebrate and most invertebrate animals. It consists of nervous tissue and is typically located in the head (cephalization), usually near organs for ...

, and

cerebellum The cerebellum (: cerebella or cerebellums; Latin for 'little brain') is a major feature of the hindbrain of all vertebrates. Although usually smaller than the cerebrum, in some animals such as the mormyrid fishes it may be as large as it or eve ...

, with samples from 97 individual in 27 different tissues.

In-situ hybridization ''In situ'' hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA, RNA or modified nucleic acid strand (i.e., a probe) to localize a specific DNA or RNA sequence in a portion or section of tissue (''in situ'') ...

of the human

transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...

indicates expression is highest in the

atrioventricular node The atrioventricular node (AV node, or Aschoff-Tawara node) electrically connects the heart's atria and ventricles to coordinate beating in the top of the heart; it is part of the electrical conduction system of the heart. The AV node lies at the ...

, followed by the

testis A testicle or testis ( testes) is the gonad in all male bilaterians, including humans, and is Homology (biology), homologous to the ovary in females. Its primary functions are the production of sperm and the secretion of Androgen, androgens, p ...

, testis

germ cell A germ cell is any cell that gives rise to the gametes of an organism that reproduces sexually. In many animals, the germ cells originate in the primitive streak and migrate via the gut of an embryo to the developing gonads. There, they unde ...

s, and testis interstitial tissue.

Transcript level regulation

Transcription factor assessment indicates many potential

TATA-binding protein The TATA-binding protein (TBP) is a general transcription factor that binds to a DNA sequence called the TATA box. This DNA sequence is found about 30 base pairs upstream of the transcription start site in some eukaryotic gene promoters. T ...

and

CCAAT-enhancer-binding proteins CCAAT-enhancer-binding proteins (or C/EBPs) is a Protein family, family of transcription factors composed of six members, named from C/EBPα to C/EBPζ. They promote the expression of certain genes through interaction with their Promoter (biology ...

sites, along with transcription factors associated with the

thymus The thymus (: thymuses or thymi) is a specialized primary lymphoid organ of the immune system. Within the thymus, T cells mature. T cells are critical to the adaptive immune system, where the body adapts to specific foreign invaders. The thymus ...

kidney In humans, the kidneys are two reddish-brown bean-shaped blood-filtering organ (anatomy), organs that are a multilobar, multipapillary form of mammalian kidneys, usually without signs of external lobulation. They are located on the left and rig ...

s, and

cardiac The heart is a muscular organ found in humans and other animals. This organ pumps blood through the blood vessels. The heart and blood vessels together make the circulatory system. The pumped blood carries oxygen and nutrients to the tissu ...

tissue.

Protein level regulation

There are two

ubiquitination Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 19 ...

sites on C1orf112, at position

lysine Lysine (symbol Lys or K) is an α-amino acid that is a precursor to many proteins. Lysine contains an α-amino group (which is in the protonated form when the lysine is dissolved in water at physiological pH), an α-carboxylic acid group ( ...

73 and at position 783 on

isoform A protein isoform, or "protein variant", is a member of a set of highly similar proteins that originate from a single gene and are the result of genetic differences. While many perform the same or similar biological roles, some isoforms have uniqu ...

X1. Downstream of reading frame, there are three

polyadenylation Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euka ...

signals. In addition, there is an N6-acetyllysine site at

747 and a

phosphoserine Phosphoserine (abbreviated as SEP or J) is an ester of serine and phosphoric acid. Phosphoserine is a component of many proteins as the result of posttranslational modifications. The phosphorylation of the alcohol functional group in serine to pro ...

site at serine 23. C1orf112 has been found experimentally to interact with

ATG1 AuTophaGy related 1 (Atg1) is a 101.7kDa serine/threonine kinase in ''S.cerevisiae'', encoded by the gene ATG1. It is essential for the initial building of the autophagosome and Cytoplasm-to-vacuole targeting, Cvt vesicles. In a non-kinase role it ...

, an

aldosterone Aldosterone is the main mineralocorticoid steroid hormone produced by the zona glomerulosa of the adrenal cortex in the adrenal gland. It is essential for sodium conservation in the kidney, salivary glands, sweat glands, and colon. It plays ...

secretion whose overexpression characterizes certain forms of breast cancer.

Post-translational modification In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translation (biolog ...

s predictions include

O-glycosyl-oligosaccharide-glycoprotein N-acetylglucosaminyltransferase III Acetylgalactosaminyl-O-glycosyl-glycoprotein beta-1,3-N-acetylglucosaminyltransferase (, ''O-glycosyl-oligosaccharide-glycoprotein N-acetylglucosaminyltransferase III'', ''uridine diphosphoacetylglucosamine-mucin beta(1->3)-acetylglucosaminyltrans ...

and

sumoylation In molecular biology, SUMO (Small Ubiquitin-like Modifier) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function. This process is called SUMOylation (pronounced ...

, and sumoylation interaction sites.

Interacting proteins

C1orf112 is predicted to interact with a diverse range of proteins, including multiple mitosis-associated proteins. C1orf112 is also predicted to interact with FIGNL1, a protein involved in DNA double-stranded break repair via

homologous recombination Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in Cell (biology), cellular organi ...

. Experimental findings indicate C1orf112 interacts with

NUF2 Kinetochore protein Nuf2 is a protein that in humans is encoded by the ''NUF2'' gene. This gene encodes a protein that is highly similar to yeast Nuf2, a component of a conserved protein complex associated with the centromere. Yeast Nuf2 disappe ...

, a spindle-pole body protein that plays a critical role in nuclear division, and TTK, a protein

kinase In biochemistry, a kinase () is an enzyme that catalyzes the transfer of phosphate groups from high-energy, phosphate-donating molecules to specific substrates. This process is known as phosphorylation, where the high-energy ATP molecule don ...

capable of

phosphorylating In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols: : This equation can be writt ...

serine Serine (symbol Ser or S) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α- amino group (which is in the protonated − form under biological conditions), a carboxyl group (which is in the deprotonated − ...

threonine Threonine (symbol Thr or T) is an amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated −NH form when dissolved in water), a carboxyl group (which is in the deprotonated −COO− ...

, and

tyrosine -Tyrosine or tyrosine (symbol Tyr or Y) or 4-hydroxyphenylalanine is one of the 20 standard amino acids that are used by cells to synthesize proteins. It is a conditionally essential amino acid with a polar side group. The word "tyrosine" is ...

Homology/evolution

Paralogs

There are no known

paralogs Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speci ...

of C1orf112.

Orthologs

C1orf112 is highly conserved in ''Pan troglodytes'', ''

Rhinopithecus bieti The black-and-white snub-nosed monkey (''Rhinopithecus bieti''), also known as the Yunnan snub-nosed monkey, is a large black and white primate that lives only in the southern Chinese province of Yunnan, where it is known to the locals as the Yu ...

,'' ''Castor canadensis'', ''Miniopterus natalensis'', and other select

primate Primates is an order (biology), order of mammals, which is further divided into the Strepsirrhini, strepsirrhines, which include lemurs, galagos, and Lorisidae, lorisids; and the Haplorhini, haplorhines, which include Tarsiiformes, tarsiers a ...

s, with percent identity relative to '' Homo sapien'' C1orf112, with percent identity greater than 90%. Orthologs with the greatest date of divergence (date of

speciation Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within ...

) to human C1orf112 include ''Trichosporon asahii'', a

placozoa Placozoa ( ; ) is a phylum of free-living (non-parasitic) marine invertebrates. They are blob-like animals composed of aggregations of cells. Moving in water by ciliary motion, eating food by Phagocytosis, engulfment, reproducing by Fission (biol ...

, and ''

Amphimedon queenslandica ''Amphimedon queenslandica'' (formerly known as ''Reniera'' sp.) is a sponge native to the Great Barrier Reef. Its genome has been sequenced. It has been the subject of various studies on the evolution of metazoan development. ''A. queenslandic ...

,'' indicated that C1orf112 has been preserved over evolutionary time. Date of divergence was calculated using TimeTree. The E value indicates the number of "hits" one can expect to see by chance when using the NCBI database, with a low E value indicated a significant result. Percent identity is the percentage of character that align to ''Homo sapien'' C1orf112 Isoform X1, while percent similarity is the degree of resemblance when the two sequences are aligned with one another. C1orf112Predicted3DStructure

Protein Structure

Secondary and Tertiary Structure

C1orf112 secondary structure is predicted to be predominately

alpha helical An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix). The alpha helix is the most common structural arrangement in the secondary structure of proteins. It is also the most extreme type of l ...

, with < 5% of the protein composed of

beta sheet The beta sheet (β-sheet, also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gene ...

s. Ligand binding sites are predicted by I-TASSER from positions 377 to 530 in Isoform X1. A

leucine zipper A leucine zipper (or leucine scissors) is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amin ...

motif is present in Isoform X1 from positions 831-852, predicted by MyHits.

Clinical significance

C1orf112 was one of many genes found to be co-expressed with cancer-associated genes, and the knockdown of this gene in a HeLa cell line suppressed growth.